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Preface to the first volume of Model Order Reduction 


This is the first of the three-volume set Model Order Reduction intended to be used 
as a handbook in partial fulfilment of the goals of the COST Action EU-MORNET. The 
first two volumes deal with methods and algorithms, while the third and final volume 
is devoted to specific applications. Before discussing the contents of Volume 1 (for 
the contents of Volumes 2 and 3, see the respective editorials there), we would like to 
explain the background of this project. 


EU-MORNET: Model Order Reduction in Europe 

European researchers have realized the importance of Model Order Reduction (MOR) 
and reduced-order modeling already since the 1990s, in the scientific computing 
and computational engineering communities as well as in the area of systems and 
control. Since then, the interest has grown steadily, with many workshops and con- 
ferences having been organized, and several MOR research groups emerging. In the 
early 2000s, the first workshops were organized that brought researchers from these 
various areas together. This includes the 2003 workshop on “Dimension Reduction 
of Large-Scale Systems” at the Mathematical Research Center Oberwolfach and the 
2005 workshop “Model Order Reduction - Coupled Problems and Optimization” at 
the Lorentz Center in Leiden. Both inspired the publication of tutorial-style collec- 
tions, leading to two of the first books on MOR.!? At the same time, the first research 
monograph fully dedicated to MOR appeared.’ In addition, comprehensive European 
projects like CODESTAR (“Compact modelling of on-chip passive structures at high 
frequencies”, 2002-2004), CHAMELEON-RF (“Comprehensive High-Accuracy Mod- 
elling of Electromagnetic Effects in Complete Nanoscale RF Blocks”, 2004-2006), 
and O-MOORE-NICE! (“Operational MOdel Order REduction for Nanoscale IC Elec- 
tronics”, 2007-2010) made abundant use of MOR. The European Research Training 
Network COMSON (“Coupled Multiscale Simulation and Optimization in Nanoelec- 
tronics”, 2007-2009) also had a major task on MOR, and organized an autumn school 
on the Dutch island of Terschelling. It is still remembered by many participants, due 
to the nice food and luxurious accommodation, but also because many leading MOR 
researchers from all over the world were present. During this autumn school, there 


1 Peter Benner, Volker Mehrmann, and Danny C. Sorensen (Eds.), Dimension Reduction of Large- 
Scale Systems, Lecture Notes in Computational Science and Engineering, Vol. 45, Springer-Verlag, 
Berlin/Heidelberg, 2005. 

2 Wilhelmus H. Schilders, Henk van der Vorst, and Joost Rommes (Eds.), Model Order Reduction: The- 
ory, Research Aspects and Applications, Mathematics in Industry, Vol. 13, Springer-Verlag, Berlin/Hei- 
delberg, 2008. 

3 Athanasios C. Antoulas, Approximation of Large-Scale Dynamical Systems, SIAM, Philadelphia, 
2005. 
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was a first discussion on starting a European network on MOR, but due to the lack of 
funding opportunities, there was no immediate follow-up. 

In 2013, Peter Benner, chair of one of the MOR centers in Europe (the Max Planck 
Institute for Dynamics of Complex Technical Systems in Magdeburg), together with 
Albert Cohen (Paris), Mario Ohlberger (Miinster), and Karen Willcox (then at MIT) or- 
ganized a workshop in the Luminy mathematics research centre CiRM, located beau- 
tifully off the coast in the south of France, and this turned out to be the ideal setting 
for the preparation of a so-called COST Action on MOR. The lectures during the day 
and the very pleasant atmosphere in the evenings put us in the right mood for writ- 
ing. The aim of the proposal was to “bring together all major groups in Europe working 
on a range of model reduction strategies with applications in many of the COST do- 
mains”. The proposal survived the first round, and was admitted to the second round, 
which meant going to Brussels for an interview with a very broad and general com- 
mittee. The overall chances of success were approximately 4 %, but we succeeded and 
hence EU-MORNET was born. The first management committee meeting took place in 
Brussels in April 2014, and since then many activities have been organized and under- 
taken. Highlights were the MoRePaS conferences in Trieste and Nantes, the Durham 
workshop in August 2017, organized jointly with the London Mathematical Society, 
and MODRED held 2017 in Odense. The network was growing constantly, and when 
the funded period of EU-MORNET ended in April 2018, more than 300 researchers 
had joined the network. We hope to sustain this network, e. g., via its webpage eu- 
mot.net, as coordination of activities has turned out to be very fruitful, it has put MOR 
in the spotlights, and we observe that the interest in MOR is only growing: many Eu- 
ropean projects make use of it, or emphasize its importance like the recently ended 
ECSEL project Delphi4LED. A glimpse at some of the various applications encompass- 
ing MOR in its computational workflows is provided in Volume 3 of this handbook 
project. We are very grateful to the COST Organization for supporting this initiative, 
thereby bringing MOR in Europe to the next level. This handbook also serves as the 
ultimate dissemination effort of EU-MORNET and will hopefully help generations of 
new researchers and practitioners to get a gentle introduction into the field and to find 
inspiration for their own development and research work. 


Introduction to Volume 1 

This first volume starts with an introductory chapter to MOR in general as a very broad 
field of research, encompassing multiple techniques with applications in a wide va- 
riety of fields. This chapter serves two main purposes. On the one hand, it provides 
an introduction to the handbook project itself, helping the reader navigate through 
the three volumes, explaining their organization, providing pointers into the various 
chapters where specific methods are presented or where particular applications are 
further explored. Additionally, this first chapter also serves as a conduit to introduce 


Preface to the first volume of Model Order Reduction —— VII 


concepts and notation used throughout the various chapters and volumes, in an at- 
tempt to support, simplify and enrich the reader’s experience when probing the infor- 
mation provided in the three volumes of “Model Order Reduction”. 

After this initial, introductory chapter, all chapters of this first volume mostly 
focus on the concept of MOR applied in a system-theoretical context. The common 
principle among methods and algorithms in this setting is the basic assumption that 
there is an underlying system description whose behavior can be determined from the 
knowledge of the dynamics of a set of state variables. Specific developments both in 
theory and applications, including deployment in commercial CAD tools, took place 
over the years in specific settings and disciplines, sometimes using different language 
and notation. However, all such methods share a common framework, which we 
attempted to capture in this book. 

The second chapter in this volume, by T. Breiten and T. Stykel, is devoted to meth- 
ods associated with the concept of energy of a system and with the problem of how to 
represent it in balanced coordinates. This enables discarding the least relevant states 
from an input-output perspective. The resulting truncated system has several very in- 
teresting properties, which are discussed in the context of linear and nonlinear reduc- 
tion. 

The third chapter by L. Feng and P. Benner delves into the realm of moment- 
matching methods (also known as Padé-type approximations, relating to rational in- 
terpolation) as a metric for reduction, and details methods based on projection tech- 
niques for compressing linear, nonlinear and parametric systems. 

The next chapter of P. Tiso etal. is devoted to modal truncation applied to lin- 
ear and nonlinear systems. This chapter discusses techniques based on analysis of 
the system dynamics, in particular the observation of its eigenmodes and consequent 
truncation leading to reduced-order models. 

Enforcing specific desirable or required system properties after reduction is the 
target of the next chapter, by S. Grivet-Talocia and L. M. Silveira, which is devoted 
to post-processing techniques. In particular, the most prominent techniques for en- 
forcing passivity of linear systems via perturbation approaches are introduced and 
discussed. 

The following chapter serves as an interesting bridge between moment-matching 
methods described as rational interpolation, to data-driven interpolation techniques 
connecting to approaches that start from measurements of the system. This chapter, 
by D. Karachalios, I. V. Gosea and A.C. Antoulas, introduces the Loewner framework 
for system reduction and connects to moment matching, interpolation and projection. 

The seventh chapter, by R. Zimmermann, continues the trend of discussing inter- 
polation methods, but it introduces manifold interpolation as a supporting tool in the 
reduction framework of parameterized systems. 

The final three chapters are entirely dedicated to exploring model reduction tech- 
niques fueled by data obtained from system behavior. 
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The eighth chapter, by P. Triverio, discusses Vector Fitting, a data-driven algorithm 
where samples or measurements of the system response are used to construct a re- 
duced representation. 

The ninth chapter, by G. Santin and B. Haasdonk, stays in the realm of data-driven 
reduction and introduces kernel methods as surrogate system models. It introduces a 
series of methods where the system representation is unknown or eschewed and a 
reduced representation is constructed or estimated from the information garnered by 
sampling the system or its outputs. 

Last but not the least, the tenth and final chapter, by J. Kleijnen, presents Krig- 
ing techniques: a set of data-driven interpolation techniques for generating a reduced 
model through kernel regression assuming an underlying Gaussian distribution. 

At this point, we would like to thank also all the contributing authors who brought 
this project to life, the numerous anonymous reviewers who ensured the quality of the 
30 chapters of the three volumes of the Model Order Reduction handbook series, and 
last but not least Harshit Bansal, who helped with producing the index for every of 
the three volumes. Our gratitude also goes to the De Gruyter staff, and in particular to 
Nadja Schedensack, for accompanying this project constructively over more than four 
years, with unprecedented patience. 
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1 Model order reduction: basic concepts 
and notation 


Abstract: This is the first chapter of a three-volume series dedicated to theory and ap- 
plication of Model Order Reduction (MOR). We motivate and introduce the basic con- 
cepts and notation, with reference to the two main cultural approaches to MOR: the 
system-theoretic approach employing state-space models and transfer function con- 
cepts (Volume 1), and the numerical analysis approach as applied to partial differen- 
tial operators (Volume 2), for which projection and approximation in suitable function 
spaces provide a rich set of tools for MOR. These two approaches are complementary 
but share the main objective of simplifying numerical computations while retaining 
accuracy. Despite the sometimes different adopted language and notation, they also 
share the main ideas and key concepts, which are briefly summarized in this chapter. 
The material is presented so that all chapters in this three-volume series are put into 
context, by highlighting the specific problems that they address. An overview of all 
MOR applications in Volume 3 is also provided. 


Keywords: model order reduction, (Petrov—)Galerkin projection, snapshots, paramet- 
ric operator equation, transfer function 


MSC 2010: 35B30, 37M99, 41A05, 65K99, 93A15, 93C05 


1.1 Overview 


The ever-increasing demand for realistic simulations of complex products and pro- 
cesses places a heavy burden on the shoulders of mathematicians and, more gen- 
erally, researchers and engineers working in the area of computational science and 
engineering (CSE). Realistic simulations imply that the errors of the virtual models 
should be small, and that different aspects of the product or process must be taken 
into account, resulting in complex coupled simulations. 
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Often, there is a lot of superfluous detail in these simulations that is not needed 
to provide accurate results. With the current advances in new computer architectures, 
viz. the availability of many processors, one might be tempted to just use the abundant 
computational resources. However, this could lead to enormous energy consumption 
for the simulations, which should be avoided if possible. Besides, it could still lead 
to very lengthy and time-consuming simulations. Hence, it seems wise to use meth- 
ods that can reduce the size of such huge problems, and which are able to get rid 
of the superfluous and unnecessary detail, still guaranteeing the accuracy of solu- 
tions. 

An example is provided by the co-simulation of an electronic circuit with the in- 
terconnect structure that is mounted on top of the circuit to provide all desired con- 
nections. The metallic interconnect structure causes electromagnetic effects that may 
influence the behavior of the underlying circuit. There is no need, however, to solve 
the Maxwell equations for this complex three-dimensional structure in full detail, only 
the most dominant effects causing delays need to be included. This is where model or- 
der reduction (MOR) comes into play; MOR methods are able to extract the dominant 
behavior by reducing the size of the system to be solved. 

To explain what MOR is, we often use the following picture: 


If we showed the picture on the right, everyone would recognize that this is a rab- 
bit. Hence, we do not need the detailed representation in the left picture to describe 
the animal. Maybe we would need a slightly more detailed description as shown in 
the middle picture, depending on the demands.’ MOR works in the same way: the 
original problem is reduced, the representation of the solution is given with far less 
variables, and the hope is that this is sufficient to guarantee an accurate solution. If 
more accuracy is needed, clearly the problem size should be reduced to a lesser ex- 
tent. 


1 Disclaimer: though our picture might indicate this, note that simply using a coarser mesh in a dis- 
cretization of a continuous model is not a competitive MOR technique in general! 
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By now, MORisa very active and relatively mature field of research. Much progress 
has been made in the past 40 years, and in many different directions. The seminal 
papers by Feldmann and Freund (1994/95) [9, 10], simultaneously with [12], sparked 
many development in the area of Krylov-type methods, strongly related to the field 
of numerical linear algebra. Mimetic elements were considered, leading to passivity- 
preserving and structure-preserving methods, necessary to retain vital properties of 
the original system of equations in electronics and electromagnetics. First textbooks 
appeared with a special focus on applications in this area, [11] even before the above 
papers, and after the field had matured, new textbooks [13, 22] and several collections 
[3, 5] were published. 

Within the systems and control area, one concentrates mainly on balancing tech- 
niques, involving the solution of Lyapunov equations. The main ideas center on pre- 
serving those states in a dynamical system that can be reached with the least amount 
of energy, and on the other hand, those states that provide the most information when 
observed. In balanced coordinates, both sets of states coincide. Here, the seminal 
paper of [16] has to be mentioned which rendered these ideas computationally fea- 
sible. A first textbook in this area was published in 2001 by Obinata and Anderson 
[17]. 

Also, the need for MOR was already discussed in the 1960s in mechanical engi- 
neering, from which the technique of modal truncation emerged, in combination with 
substructuring and with further developments like component mode synthesis. These 
are nowadays standard techniques, found in many variants in structural analysis soft- 
ware, and covered by many textbooks in numerical mechanics. A comprehensive text- 
book focusing on this area is [18]. 

First textbooks and collections of tutorials that made connections between the 
MOR techniques developed mainly in the above-mentioned application areas started 
appearing in the mid-2000s, including the fundamental textbook [1] by Antoulas in 
2005, and the edited volumes [6, 21]. 

Later, researchers started to consider parametric model order reduction, espe- 
cially within the area of reduced basis methods, focusing on the fast solution of para- 
metric partial differential equations (PDEs) [14, 19, 20]. A related, but somewhat dif- 
ferent approach to parametric PDEs was developed in the framework of the proper 
generalized decomposition [8]. One can also find a collection of articles on MOR for 
parametric problems in [7]. 

Methods for nonlinear problems were also considered, important developments 
being the empirical interpolation methods and other so-called hyperreduction tech- 
niques. But also methods like proper orthogonal decomposition (POD), where snap- 
shots of the solution of a nonlinear problem are used to create a basis for solutions, 
became popular for nonlinear problems. Basic concepts of these approaches, includ- 
ing also the MOR techniques already mentioned above, also with some historical per- 
spectives, can be found in the collection of tutorials [4]. 
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The demand for more realistic simulations led to the development of MOR meth- 
ods for interconnected and coupled systems. Extensions to descriptor systems, or al- 
ternatively differential-algebraic systems, led to the creation of index-preserving meth- 
ods and to the development of an interpolatory projection framework for model reduc- 
tion of descriptor systems. An entirely different approach is provided by data-driven 
methods, in which the Loewner framework plays an important role; see the recent 
textbook on the interpolatory framework including the Loewner framework for more 
details [2]. 

Most recently, focus also turned to data-driven and non-intrusive MOR methods, 
requiring no or only incomplete knowledge of the physical model, but merely relying 
on tools or software to produce relevant data from which a model description can be 
inferred. One prominent technique in this area is dynamic mode decomposition [15], 
with many new methods emerging even more recently, often making connections to 
techniques from machine and deep learning. 

The three volumes constituting this handbook of Model Order Reduction discuss 
many of the aforementioned developments and methods. This first volume contains 
theoretical expositions of system-theoretic, interpolatory, and data-driven methods 
and algorithms. The second volume treats snapshot-based methods and algorithms for 
parametric PDEs. The mathematical strategy behind these methods relies on Galerkin 
projection on finite-dimensional subspaces generated by snapshot solutions corre- 
sponding to a specific choice of parameters. The third volume contains a large variety 
of applications of MOR. Originally, the fields of mechanical engineering, automation 
and control, as well as the electronics industry were the main driving forces for the de- 
velopments of MOR methods, but in recent years, MOR has been introduced in many 
other fields (not all covered in Volume 3, though), like chemical and biomedical en- 
gineering, life sciences, geosciences, finance, fluid mechanics and aerodynamics, to 
name a few. Moreover, Volume 3 also provides a chapter surveying the landscape of 
existing MOR software. 


1.2 A quick tour 


MOR is a multidisciplinary topic, which has been developed over the last decades 
by mathematicians, scientists and engineers from widely different communities. Al- 
though the main ideas in MOR can be classified in a relatively small set of fundamen- 
tal problem statements and related reduction approaches, these ideas have been de- 
veloped by different communities with different languages, notation, and scope. One 
of the purposes of the Model Order Reduction handbook project is in fact to provide 
a comprehensive overview of MOR approaches, hopefully forming bridges that cross 
different disciplines. 
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Several classifications can be attempted in the world of MOR. Probably the most 
natural high-level classification distinguishes between the two main cultural ap- 
proaches of system theory on one side, and numerical analysis as applied to solving 
PDEs on the other. Other classifications may be considered, for instance based on the 
various classes of reduction approaches, which may be model-driven or data-driven, 
optimal or heuristic, deterministic or stochastic, or alternatively on the type of the 
system being addressed, which can be linear or nonlinear, uniquely defined or pa- 
rameterized by some geometrical or material variable, deterministic or stochastic, 
finite- or infinite-dimensional. The specific methods that apply in each of these cases 
will be discussed in detail in the various chapters of this three-volume series. In this 
initial chapter we mainly distinguish the two major cultural approaches to MOR, for 
which reduction methods, notation and language are sometimes quite different in the 
existing literature. 

System-theoretic approaches usually deal with a system under investigation de- 
scribed as a large-scale set of Ordinary Differential Equations (ODEs) or Differential- 
Algebraic Equations (DAEs), whose dynamics is expressed in terms of a set of state 
variables. The main objective is to derive some compact Reduced-Order Model (ROM) 
with the same structure, characterized by a significantly smaller number of states, and 
whose response approximates the true system response according to well-defined cri- 
teria. Very often the ROM represents a component of a larger system that is impos- 
sible or impractical to solve in its full-size formulation. In this setting, reduction is 
required in order to replace the original large-scale description of individual compo- 
nents with accurate and robust ROMs, so that a global system-level numerical simu- 
lation becomes feasible. 

A second major approach to MOR addresses fast numerical solution of PDEs. In 
this setting, the field problem of a PDE is taken as the starting point. In the snapshot- 
based methods, the full-order variational form is often retained by the MOR process. 
This allows projection-based methods to utilize the parametric operator equations 
and define a reduced-order operator of the same parametric dependency. Since the 
starting point is the PDE form, a discretization in space and time is required, leading 
to alarge-scale discretized model. Various methods exist to control the approximation 
error, often balancing rigorousness with computability. 

We see that the above two approaches share their main objective of speeding up 
numerical computations with control over approximation errors, although the starting 
points are different. We should, however, consider that the two approaches practically 
coincide once a field problem described in terms of PDEs is discretized in its space 
variables in terms of suitable coefficients, which basically play the same role of the 
state variables in system-theoretical approaches. 

We address the two approaches in Sections 1.2.1 and 1.2.2, by introducing basic 
notation and concepts that will be used extensively throughout Volumes 1 and 2 of 
this book series. Section 1.2.3 provides a glimpse at the MOR applications that are ex- 
tensively discussed in Volume 3. 
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1.2.1 The system-theoretic approach 


System-theoretic approaches consider models whose dynamics is expressed in terms 
of internal state variables, which in the finite-dimensional setting are denoted as x(t) € 
X c R”. These states evolve with time t € [tọ, T] according to dynamic equations 
which are driven by some inputs or control signals u(t) € U c R”, whereas the quan- 
tities of interest or outputs are y(t) € Y c RI, usually with q « n. Denoting the “true” 
system as S, the MOR objective is to obtain an approximate system S with a small 
number r < n of internal states X(t) € Æ c R”. Reduction is conducted by enforcing 
appropriate approximation conditions such that, given input signals u(t), the ROM S 
provides an output y(t) that is “close” in some sense to the corresponding output y(t) 
of the original system S. 


1.2.1.1 Standard system descriptions: the LTI case 


The simplest system description assumes Linearity and Time-Invariance (LTI) and is 
provided by a set of ODEs in state-space form 


X(t) = Ax(t) + Bu(t), x(to) = Xo 
y(t) = Cx(t) + Du(t), 


(1.1) 


where x(t) denotes the time derivative of x(t), A € R™", B €e R®™, C €e R?”,D e 
R? are constant matrices, and x, is a prescribed initial condition. A more general 
formulation of LTI system dynamics can be expressed in descriptor form, 


Ex(t) = Ax(t)+ Bu(t), (to) = Xo» 
y(t) = Cx(t) + Du(t), 


(1.2) 


where an additional and possibly singular matrix E € R™®” enters the state equations. 
Casting (1.2) in the Laplace domain and assuming vanishing initial conditions, xq = 0, 
leads to 


Y(s) = H(s)U(s), H(s) = C(sE-A) 'B+D, (1.3) 


where H(s) is the transfer function of the system and s € C is the Laplace variable. 
Well-posedness of (1.3) requires that det(sE — A) + O for some s, i.e., that the pen- 
cil (A, E) is regular. In most cases also an (asymptotic) stability requirement is estab- 
lished, so that all finite eigenvalues of the pencil (A, E) have a (strictly) negative real 
part. 

This system description forms the basis of most of the following chapters in this 
volume. 
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1.2.1.2 Approximation criteria 


Some common approximation criteria that are appropriate for LTI systems are listed 
now: 


The quantities of interest of both full-scale system S and reduced system S are 
the outputs y(t) and y(t), respectively. Therefore, it is natural to bound the output 
error defined as ||[Y—y||- within a suitable function space £, with the natural choice 
being the Hilbert space of square integrable signals L,(to, T), with 


T 
Mien = | OR at. (1.4) 
to 


An alternative is to control the error of the ROM transfer function A(s) by min- 
imizing ||H - H l4, where #H is an appropriate function space. Common choices 
are the Hardy spaces H, and H,,, which are adequate under asymptotic stability 
assumptions, for which 


+00 


IHllz,, = = | IHGw)lp dw, IHl, = SUD sec, IHS)» (1.5) 
—-0CO 

where F denotes the Frobenius norm and j = y-1. We refer to Chapter 2 in this 
volume for more precise definitions and for an introduction of the main system- 
theoretic properties that are relevant for error control in MOR. 

Data-driven approaches aim at enforcing suitable interpolation or approximation 
conditions starting from available samples of the original transfer function H; = 
H(s;,) at a set of complex frequencies s; for k = 1,..., k. Interpolation methods (see 
Chapter 6, where the Loewner framework is introduced and discussed) enforce 


A(s) =H, Wk=1,...,k, (1.6) 


possibly extending this exact matching also to higher derivatives 


W=0,...,0, Yk=1,...,k, (1.7) 


giving rise to so-called moment-matching methods (see Chapter 3). In some cases, 
the point and moment matching is performed at adaptively selected frequencies 
Sx € C; see, e. g., the IRKA algorithm in Chapter 3. Moments can also be matched 
implicitly through projection of the original system onto suitably-defined Krylov 
subspaces, also discussed in Chapter 3. 

A relaxed version of the above matching conditions involves minimization of the 
least squares error, 


k 
X$ Asy) - Helle (1.8) 
k=1 
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Curve fitting approaches, including the Vector Fitting (VF) methods (see Chap- 
ter 8) fall into this class. When data H, come from measurements, only purely 
imaginary frequencies S = jw, are available and used. 

— A fundamental class of system-theoretic approaches for MOR are based on trun- 
cation of state-space or descriptor systems, where those state variables that are 
poorly coupled to the inputs or which provide negligible contribution to the out- 
puts are discarded. Balanced truncation methods (see Chapter 2) and modal meth- 
ods (Chapter 4) belong to this class. 

— Some applications require additional constraints to be enforced during reduction. 
A notable case is enforcements of passivity and of dissipativity, which are appro- 
priate for systems that are unable to generate energy on their own. Dissipativity 
conditions for state-space systems are reviewed in Chapters 5 and 2, together with 
appropriate methods for their enforcement, either as a feature of the MOR scheme 
or as a postprocessing. 


1.2.1.3 Parameterized LTI systems 


An additional layer of complexity is introduced by allowing the system S to be param- 
eterized by some deterministic and/or stochastic variables u € P c R”. Assuming that 
the input signals u are not parameter-dependent, we can write (1.2) in the parameter- 
ized form 


E(u)x(t, u) = A(u)x(t, u) + BQwu(t), x(to, H) = Xo(W), 


(1.9) 
y(t,u) = CQ@)x(t, u) + Dutt), 
with the corresponding transfer function 
H(s,p) = CH)(SEH) - AQ) BQ) + DQ. (1.10) 


In this parameterized setting, one is usually interested in preserving a closed-form 
parameterization also in the ROM, so that the corresponding transfer function must 
match (1.10) not only throughout the frequency band of interest, but also throughout 
the parameter space. Chapter 3 provides an overview of moment-matching parame- 
terized MOR (PMOR) in the case of affine dependence of E(u) and A(u) on the param- 
eters. The so-called reduced basis methods discussed in Chapter 4 of Volume 2 would 
provide the counterpart of PMOR in the PDE reduction setting, which is extensively 
treated in all chapters of Volume 2. 


1.2.1.4 Nonlinear systems 


Generalization to nonlinear systems is also possible, although effectiveness of MOR 
strongly depends on the class of systems being considered. Several results are avail- 
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able for systems that can be cast in the form 


X(t) = f(x) + g(x(0)) uo), 
y(t) = h(x(t)), 


where f : R” > R”, g : R” > R™” and h : R” — R? are smooth functions. A no- 
table particular case is the quadratic-bilinear form, for which the nonlinear functions 
can be written and/or approximated as quadratic polynomials in their variables and 
compactly expressed, e. g., as 


(1.11) 


f(x) = f(0) + Ayx + A(x @ x), (1.12) 


where @ is the Kronecker product and f(0) € R”, A; € R™", A, € R?” are constant 
matrices. A discussion of methods applicable to MOR of such systems is available in 
Chapters 2 and 3. 

In more general settings, supporting algorithms providing interpolation/approx- 
imation of high-dimensional nonlinear multivariate functions are indeed available. 
We mention the Empirical Interpolation Methods in their various formulations intro- 
duced in Chapter 1 of Volume 2 and manifold interpolation (Chapter 7 in this volume), 
which provides a general framework for interpolation of orthogonal bases, subspaces 
or positive definite system matrices. Both these approaches are recurrent tools in sev- 
eral modern MOR frameworks. 


1.2.1.5 Surrogate modeling 


Extending the framework of classical MOR, which in the system-theoretic approach 
is usually applied to a state-spate description, surrogate modeling approaches pro- 
vide tools for processing sequences of input-output data points and constructing an 
approximate metamodel that explains and reproduces their relationship. The last two 
chapters in this volume describe two alternative approaches for surrogate modeling. 
Chapter 9 presents an overview of the celebrated kernel methods, an approach that 
is very popular in the machine learning community, both for acceleration of complex 
simulation models, but also for classification and signal processing. Chapter 10 dis- 
cusses Kriging methods or Gaussian Processes (GPs), with emphasis on design and 
analysis of computer experiments. These extensions of MOR bridge the gap between 
control and system theory with statistics, computer science, and (big) data science, 
further demonstrating how pervasive the key objectives are that characterize MOR. 


1.2.2 The PDE approach 


The second major approach to MOR starts from a field problem defined over a continu- 
ous domain Q. Thus, a parametric PDE is given as starting point of the MOR procedure. 
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Two main steps are performed: the numerical discretization in space and time and the 
projection of the discretized form onto a reduced-order space. The projection space is 
chosen such that the field variable is well approximated in a natural PDE norm or it 
is chosen with respect to a given output quantity of interest. These basic tools are dis- 
cussed in more detail in Chapter 1 of Volume 2. 

The variational or weak form of a parametric linear PDE in the continuous setting 
is posed over a suitable Hilbert space V(Q) and given as 


a(u(u),v;H) =f(vsp) Wwe, (1.13) 


with bilinear form a : V x V x P > Rand linear form f : V x P — R. The parameter 
vector is denoted u and is an element of the parameter space P. In many application 
scenarios, a particular output of interest s : P — Ris sought, given by the linear form 
1:VxP— Ras 


s(H) = I(u(H); p). (1.14) 


The case of a coercive and continuous bilinear form is the setting for many intro- 
ductory examples but does not cover all PDE settings. E. g., in electromagnetics, i. e., 
when solving Maxwell’s equations, an inf-sup stable sesquilinear form is often con- 
sidered. In unsteady problems, the time-dependence is often made explicit and time 
is treated differently from other parameters in the ROM setting; see the POD-greedy al- 
gorithm for example. Nonlinear problems require particular care and methods, which 
are often adapted to the particular type of nonlinearity. 

A suitable discretization method is chosen to approximate the field variable u, 
defining a corresponding discrete space V,. The method of weighted residuals is in- 
voked to turn the continuous form (1.13) into a discrete variational formulation. 

The weak form in the discrete setting is given as 


aluh) Vn H) =f (Vas) Wh € Vp (1.15) 


with bilinear form a : V} x V} x P —> R and linear form f : V, x P — R. The space of 
all v, is the test space, while the space of u, is the trial space. 

A discrete solution is found by invoking Galerkin orthogonality, by enforcing that 
the test space is orthogonal to the residual. In Ritz—Galerkin methods, the residual is 
tested against the same set of functions as the ansatz functions, i.e., the test space is 
the same as the ansatz or trial space. In a more general Petrov-Galerkin method, test 
space and trial space are chosen as different spaces. 

Starting from the discrete high-fidelity formulation (1.15), another Galerkin pro- 
jection is invoked to arrive at the reduced-order formulation. A set of solutions is com- 
puted at parameter values Sy = (secs pm}, either by pre-specifying Sy or 
using an iterative algorithm such as the greedy sampling. These solutions are often 
called ‘snapshots’. A projection space Vy is determined by a suitable method. The dif- 
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ferent methods are briefly introduced below and discussed in much detail in dedicated 
chapters of Volume 2. 
The reduced-order variational formulation is to determine uy() € Vy, such that 


a(uy (H), vy: H) =fVp H) Wy € Vy. (1.16) 


With matrix A, assembling the bilinear form and the load vector f,, let V € 
RN denote the matrix of basis vectors, derived from the snapshot solutions and 
project (1.15) onto the reduced-order space as 


V'A,Vuy = V" fh. (1.17) 
The high-order solution is then approximated as 
u, ~ Yuy. (1.18) 


Typical ROM ingredients are an affine parameter dependency, an offline—online 
decomposition and error bounds, which are explained in Chapter 1 of Volume 2. 
Pointers to subsequent chapters for accurate ROMs in the PDE setting are given in 
this section for Volume 2. Each chapter explains in a detailed way a different method of 
how to obtain the projection spaces or follows an alternate route altogether. Numerical 
examples can be found in the respective chapters. 
— Proper Orthogonal Decomposition 
In the Proper Orthogonal Decomposition (POD), the projection space is deter- 
mined from the principal modes of the singular value decomposition of sampled 
field solutions. The sampling is uniform over the parameter domain in many 
cases. POD is covered in depth in Chapter 2 of Volume 2. 
— Proper Generalized Decomposition 
The Proper Generalized Decomposition (PGD) assumes a separated representa- 
tion, in which all variables, i.e., space, time and parameters, can be treated in 
the same way; see Chapter 3 of Volume 2. Error indicators and error bounds serve 
to iteratively build the approximation spaces. 
- Reduced Basis Method 
Reduced Basis (RB) MOR uses residual-based error indicators and error estimators 
to determine the projection space by a greedy sampling; see Chapter 4 of Volume 2. 
It is not uncommon in the literature to consider POD as a RB method. 
—  Hyperreduction 
Hyperreduction techniques are related to the Empirical Interpolation Method 
(EIM) which generally aims to approximate an affine parameter dependency for 
an originally non-affine problem. The EIM is introduced in Chapter 1 of Volume 2 
while the chapter on hyperreduction (Chapter 5 of Volume 2) details how these 
techniques can be used for ROM generation. 


12 — P. Benner etal. 


- Localized Model Order Reduction 
The localized model reduction aims to determine local ROMs valid over parts of 
the computational domain and construct a global approximation through suitable 
couplings of local ROMs. The localized ROMs are usually generated with POD and 
RB techniques; see Chapter 6 of Volume 2. 

— Dynamic Mode Decomposition 
The Dynamic Mode Decomposition (DMD) is also based on the singular value de- 
composition; see Chapter 7 of Volume 2. The starting points are measurements 
of the time-trajectory which aim to approximate the time-advancement operator. 
The DMD is thus understood as a data-driven approach, since it does not project 
an affinely expanded system matrix. 


1.2.3 Applications 


In this section, we briefly introduce the several MOR applications that are collected 
in the third volume of this book series. Several early developments in MOR originated 
in the exponentially growing field of microelectronics during the last few decades of 
the 20th century. The enormous growth in complexity in designing microprocessors 
and computing systems was requiring scalable, efficient, and especially automated 
design and verification methods. This necessity provided a fertile ground for research 
on MOR, so that many contributors from mathematics, system and control theory, and 
electronics engineering proposed several key ideas and algorithms that are still widely 
adopted in modern tools. Chapter 4 of Volume 3 reviews some of these steps and pro- 
vides an overview of MOR applications in microelectronics. It is not a surprise that 
MOR proves very successful also in electromagnetics, since electric/electronic circuits 
are just a lumped approximation of the more general Maxwell’s field formulations. Ap- 
plications of MOR in electromagnetics are discussed in detail in Chapter 5 of Volume 3. 

Not long after the initial developments, the MOR field became more and more 
mature, with consolidated approaches both in the system-theoretic and in the PDE 
communities. This enabled reaching cross- and multidisciplinary applications. Vol- 
ume 3 of this book series reports on several such applications of MOR, in particu- 
lar: chemical process optimization (Chapter 1 of Volume 3), mechanical engineering 
(Chapter 2 of Volume 3), acoustics and vibration (Chapter 3 of Volume 3), computa- 
tional aerodynamics (Chapter 6 of Volume 3) and fluid dynamics (Chapter 9 of Vol- 
ume 3). These chapters build on the methods discussed in the first two volumes, in 
some cases proposing application-driven customized versions, and testify that perva- 
sivity of MOR exists in practically all fields of applied engineering. 

Consolidation of MOR theory made algorithms more and more reliable. Therefore, 
unexpected applications started to be pursued even on biological systems. One of the 
most striking yet successful extensions is cardiovascular modeling (Chapter 8 of Vol- 
ume 3), which attempts a quantitative prediction of the behavior of the most existing 
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complex “system”, the human body. The same objective is shared by Chapter 7 of Vol- 
ume 3 on MOR applications to the neurosciences. 

MOR continues its mainstream advancement in those areas, such as mathematics 
and control, where methodological aspects have been introduced and are still contin- 
uously refined. Chapter 11 of Volume 3 combines classical reduction approaches with 
graph theory for the reduction of network systems. This contribution is quite timely 
nowadays, when relations between distributed systems, agents, individuals at physi- 
cal or social level are often described and explained based on their networked inter- 
connection structure. Another timely application of MOR is described in Chapter 10 of 
Volume 3, discussing the very important aspects of uncertainty quantification, which 
play a fundamental role in all applications when the description of the systems in 
terms of their constitutive parameters is not deterministic but subject to stochastic 
variations. 

Chapter 12 of Volume 3 confirms the relevance of MOR in industrial production set- 
tings. The recent paradigm shift towards “Industry 4.0” augmented the requirements 
for sophisticated prediction methods and tools. It is now conceivable that suitably con- 
structed abstraction layers can be devised to build so-called “digital twins”, with the 
objective of mimicking the behavior of actual physical systems in real time and during 
their lifetime. This chapter provides an overview of the state of the art in this respect, 
where MOR plays once again a key role. 

We conclude this introduction advising the reader to check Chapter 13 of Volume 3, 
which provides an overview of existing MOR software. Several commercial and aca- 
demic software packages are reviewed, suitably classified with respect to the type of 
problems being addressed. Many of the latter packages can be freely downloaded, 
used, and possibly extended by active MOR researchers with new features and func- 
tionalities. 
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Tobias Breiten and Tatjana Stykel 
2 Balancing-related model reduction methods 


Abstract: This chapter provides an introduction to the concept of system balanc- 
ing. An overview of the historical development is given, and application areas for 
balancing-based model reduction are presented. Beginning with linear systems, the 
idea of a balanced system is explained and illustrated by an introductory example. 
A detailed description of the algorithmic realization, including implementable pseu- 
docodes, is provided and numerical challenges are pointed out. Generalizations of the 
classical method of balanced truncation are reviewed. In particular, more general sys- 
tem classes such as differential-algebraic equations as well as nonlinear systems are 
discussed. Two numerical examples resulting from common partial differential equa- 
tions are reviewed and analyzed with respect to the applicability of balancing-related 
methods. Pseudocodes will allow the reader to examine the method independently. 


Keywords: balanced truncation, Gramians, Lyapunov equations, Lur’e equations, 
differential-algebraic equations 


MSC 2010: 15A24, 34A09, 65F30, 93A15, 93C05, 93D30 


2.1 Introduction 


Balancing-based model reduction relies on the concept of truncating a system that is 
given in so-called balanced coordinates. The obvious questions to be discussed are: 
what are balanced coordinates and how do we obtain them? Regarding the first ques- 
tion, we consider different Gramian matrices that represent particular energies for the 
underlying system. For the second question, we introduce suitable state-space trans- 
formations based on solutions of (non-)linear matrix equations that transform the sys- 
tem into balanced coordinates. The final reduced-order models are then obtained by 
discarding states from the balanced full-order model. The reasoning behind this ra- 
tionale is that in balanced coordinates, we can easily find the states that contribute 
least to the system energy. 


2.1.1 Historical development and overview 


Balancing-based model reduction has its origin in the design and synthesis of digi- 
tal filters; see [90]. In their work, Mullis and Roberts study optimal and equal word 
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length filters obtained by state-space transformations of discrete-time linear systems. 
In [89], the method was picked up from a detailed system-theoretic point of view and 
put into the context of principal component analysis. Besides relating the results from 
[90] to the concepts of controllability and observability, Moore already observed that 
preservation of stability is guaranteed for the reduced-order model. Together with the 
preservation of controllability and observability, this was proven in [100]. Another 
appealing feature of the classical balanced truncation method is the availability of 
an a priori error bound with respect to the H,,-norm; see [40, 48]. An efficient algo- 
rithmic realization of this method was developed in [79], and its numerical robustness 
was enhanced in [115, 132]. 

Modifications of balanced truncation were proposed soon after the appearance 
of the work by Mullis and Roberts [90]. Let us mention stochastic balanced trunca- 
tion introduced in [38] and further investigated in, e.g., [55, 56, 140], positive real 
balanced truncation [38, 63, 92] and bounded real balanced truncation [92, 94]. For 
systems involving slow and fast dynamics, the method of singular perturbation ap- 
proximation was suggested and analyzed in [83]. For systems operating at a known 
frequency range, the method of frequency-weighted balanced truncation was intro- 
duced in [40] and further refined in, e. g., [139, 143], while frequency-limited balanced 
truncation was discussed in [23, 47]. In the context of designing reduced-order con- 
trollers, we mention linear quadratic Gaussian (LQG) balanced truncation which goes 
back to [137, 72, 93] and H,, balanced truncation [91]. Balanced truncation for pos- 
itive systems was presented in [43, 110]. Most of the balancing-related methods for 
standard state-space systems were extended to differential-algebraic equations (DAEs) 
[29, 87, 108, 110, 127]. Furthermore, structure-preserving balanced truncation algo- 
rithms for second-order systems were considered in [27, 33, 86, 106], whereas balanced 
realizations for periodic discrete-time systems were discussed in [42, 135]. Balancing 
transformations for linear time-varying systems were first introduced in [119, 138] and 
further investigated in [78, 116]. The concept of balanced truncation was extended to 
a class of linear infinite-dimensional systems with finite-dimensional inputs and out- 
puts in [49] and further studied in a more general setting in [61, 105]. Based on appro- 
priate Hamilton-Jacobi equations, Scherpen extended balancing for linear systems to 
nonlinear systems in [117]. 

Due to the necessity of solving a set of (non-)linear matrix equations, for a long 
time, balanced truncation was considered to be applicable only to small and medium 
size systems. With the development of so-called low-rank methods, see, e. g., [24, 25, 
80, 97], balancing-based reduced-order models nowadays can also be computed for 
large-scale systems resulting from a spatial semi-discretization of multidimensional 
partial differential equations. For model reduction of parametric systems, a combina- 
tion of the reduced basis method for solving parameter-dependent matrix equations 
and balanced truncation was presented in [118, 122]. As a further topic of recent and 
ongoing research, we mention the use of other algebraic Gramians for balancing of 
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certain classes of control systems, among them linear stochastic systems [13, 26], bi- 
linear systems [1, 13, 16] and quadratic—bilinear systems [15]. 


2.1.2 Structure of this chapter 


In Section 2.2, we give an introduction to the classical version of balanced trunca- 
tion for linear time-invariant (LTI) control systems. We introduce the control-theoretic 
notation required for understanding the steps to construct a balanced reduced-order 
model. Based on an explicit minimal example with two states, we study the effect ofa 
balancing transformation and its consequences with respect to internal system prop- 
erties such as controllability and observability. We summarize the theoretical prop- 
erties of a balanced reduced-order model in Section 2.2.4 and provide a detailed self- 
implementable algorithm in Section 2.2.6. This also includes a discussion on approx- 
imation methods for large-scale linear matrix equations. Section 2.3 summarizes the 
different classes of Gramians used in the context of positive real balancing, bounded 
real balancing, LQG balancing, stochastic balancing, singular perturbation approx- 
imation, and cross-Gramian-based balancing, respectively. Furthermore, Section 2.4 
describes the required modifications of the method when additional algebraic con- 
straints are present, i.e., the underlying dynamics is described by a DAE system. In 
Section 2.5, we give an overview of different extensions of balanced truncation that 
are applicable in a nonlinear setting. In Section 2.6, we briefly discuss balanced trun- 
cation of (periodic) discrete-time systems and second-order systems. Section 2.7 illus- 
trates possibilities and limits of balancing-based model reduction by means of two test 
examples. 


2.2 Balanced truncation 


The content of this section is well known in the literature and can be found similarly 
in, e.g., [4, 11, 60]. For the presentation of the necessary control-theoretic concepts, 
we refer to any textbook on control theory, e. g., [4, 70, 123, 144]. 


2.2.1 Formulation of the problem 


For the remainder of this section, we consider a continuous LTI system of the form 


x(t) = Ax(t) + Bu(t), x(0) = Xo» 


(2.1) 
y(t) = Cx(t) + Du(t), 


where A € R™", B € R™™, C e RP” and D € R’*™. For fixed time t, we call x(t), 
u(t) and y(t) the state, control and output of the system. Unless stated otherwise, we 
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always assume that the system is asymptotically stable, i.e., the system matrix A has 
no eigenvalues in the closed right half-plane C*. Given the original system (2.1) of di- 
mension n, the goal of model reduction is to construct a reduced-order system of the 
same form 


x(t) = AX(t) + Bult), (0) = Ro 


A ANO = (2.2) 
y(t) = Cx(t) + Du(t), 


where A € R”, B € R”, € RP and D € RP”, Usually, for the construction of (2.2), 
we have the two (concurrent) goals. On the one hand, the system should actually be 
a reduced-order system consisting of fewer system states. Formally, we thus require 
r x n. On the other hand, the system should constitute an approximation of the origi- 
nal system and we, therefore, also demand that the reduced output approximates the 
original one, i. e., y(t) = y(t). For the construction of the reduced system matrices, we 
employ a Petrov-Galerkin projection framework: given two subspaces V, W c R” of 
dimension r and associated basis matrices V, W € R”, we approximate x(t) by VX(t) 
and enforce an orthogonality constraint on the residual 


Vx(t) — AVX(t) — Bu(t) L W. 


Since the columns of W span the subspace W, the latter condition can equivalently 
be expressed as 


W!(Vx(t) - AVX(t) - Bu(t)) = 0. (2.3) 


In case of biorthogonal matrices V, W, we have w'v = I and (2.3) yields a reduced 
system (2.2), where 


A=W'AV, B=W'B, C=CV. 


Since the feedthrough term D is independent of the system dimension n, we may con- 
struct a reduced system with D = D and thus restrict ourselves to the case D = 0. 
Note, however, that, for other classes of systems and variants of balanced truncation, 
choosing D = D is not always appropriate. 


2.2.2 Basics from LTI system theory 


With regard to establishing a measure for the approximation quality of (2.2), recall that 
by means of the variation of constants formula for x) = 0 and Xo = 0, the output error 
is given by 
t iy 
y(t) -P(t = [(ceAB - Ce“ B)u(t) dr + (D - D)u(t). 
0 
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An application of the Laplace transformation £[-] allows us to rewrite the difference 
in frequency domain as 


Liy\(s) - LIV|(s) = (G(s) - G(s))L[ul(s), (2.4) 
where 
G(s) = C(sI - A) 1B + D, (2.5) 


and G(s) = C(sI - â) tB + D are the transfer functions of systems (2.1) and (2.2), re- 
spectively. Assuming additionally that the reduced system matrix A is asymptotically 
stable, the transfer functions G, G: C* > CP% are analytic in C* and we can consider 
the Hardy space 


Hoo = {F: C* > C” | Fis analytic in C* and [Fly < oo}, 
where 


IF lla, := sup lE(s)ll2 


seCt 


and || - ||, denotes the spectral matrix norm. As a consequence of (2.4), the Plancherel 
theorem implies 


lly - FIL, (0,comR?) < ||G- Gla, Ilullr., (0,cor™)> (2.6) 


where the L,(0, co; R”)-norm for a time-varying function f : (0, o0) > R? is defined as 


If Ilz2(0,cour?) = (| rote ar) 
0 


Since balanced truncation yields an a priori error bound with respect to the H,,-norm, 
we can relate the quality ofa reduced system to the L,(0, oo; R”)-error of the underlying 
output signals. 

Let us further recall that the finite-time controllability and observability Gramians 
of system (2.1) are defined as 


t t 
X(t) = Joa dr, Y(t)= [p een dr. 
0 0 


The relevance of these Gramians in the context of model reduction is due to their con- 
nection to the input-output behavior of the system. In particular, given a reachable 
state x4 € IR", using the Moore-Penrose inverse X tpt, we can define a control 


T 
ült) = B'e" OX xa 
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that steers system (2.1) from x(0) = 0 to x(tf) = Xq in time tp. Moreover, this control is 
optimal in the sense that, for an arbitrary control u steering the system from 0 to x4, 
we find that 


xaX (tp Xa = lil orn) < Melle oR (2.7) 
Note that, for times t; < t,, we have 
z’X(t)z>2z'X(t,)z forallze R". (2.8) 


This implies X(t.) = X(t,) or, equivalently, X(t») - X(t,) = O meaning that X(t,) - X(t) 
is a positive semi-definite matrix. An analogous notation < 0 will be used for nega- 
tive semi-definite matrices. The reasoning then is that the infinite-time controllability 
Gramian 


oo 
X= | e“ BBT eA" at, (2.9) 
(0) 


which exists since A is asymptotically stable, encodes the minimum input energy re- 
quired to reach the target state x4. On the infinite-time horizon, the asymptotic limit 
of property (2.7) is usually expressed as 


0 
xX xa = min | u (t)u(t) dt. (2.10) 
ueL,(-00,0;R™) 
x(-00)=0,x(0)=x4 7% 


A similar conclusion can be drawn by noting that Y(t) yields the output energy 


Ilia ts?) generated by the initial condition x(0) = xo. This energy is given by 


ty 
T 
iÈ osx) = f(cexo) (Ce”!xo) dr = xY (tp)xo. (2.11) 
0 
Hence, the infinite-time observability Gramian 


Y= | tcl cet at (2.12) 
0 


yields a natural way of measuring the amount of energy included in given states. In 
particular, it can be shown that X and Y satisfy 


AX + XA! + BB’ =0, (2.13) 


A'TY+YA+C'C =0. (2.14) 


The above equations are linear matrix equations in the unknowns X, Y € R™" and are 
called Lyapunov equations. 
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2.2.3 The concept of balancing 


We have seen that the Gramians X and Y contain information about the input and 
output energy of the system. However, it remains open how this information can be 
used in order to obtain a reduced-order model. In general, we cannot expect that a 
state that is easy to reach produces, at the same time, a large amount of output energy. 
To illustrate this further, let us consider two different LTI systems of the form (2.1), 
where the system matrices are given by 


-2 -200 2 
A= , B= , C=(0. , D=0, 2.1 
F z | hal C = [0.02 1] (2.15) 


and 


-2 20 2 
i7 10001 5 a F 
A= | o ; | , B= Fea , C=[2 0.02], D=0. (2.16) 


~ 70001 2 


Without further knowledge, the approximability of the above two systems remains un- 
clear. Let us thus have a look at the transfer functions G and G that can (approximately) 
be computed as 


0.05s ae 4s + 1.999 


66) =. Coe 
(s) s2 +2.5Ss+1 (s) s2 + 2.5s + 0.9996 


From this point of view, one can argue that a reasonable approximation of the second 
system is given as 


4S+2 — A(2s+1) _ 4 
s2+2.5s+1 (2s+1)(s+2) s+2 


G(s) = =: G(s). 

In other words, we expect a one-dimensional reduced-order model realized viaA = -2 
and B = C = 2to yield a small approximation error. On the other hand, for the first 
transfer function G, an immediate approximation is not obvious. With the previously 
introduced concepts, one might ask whether there is a difference between G and G 
from a control-theoretic point of view. In this regard, let us analyze the corresponding 
controllability and observability Gramians. In our example, it is possible to obtain the 
exact solutions as 


-4 
rel male abe HF 
O 10 (0) 1 


= m o0 ~ f1 0 
X= , Ye : 
F 10-4 b Fe 


From (2.7) and (2.11), for the first system, we now conclude that states of the form x = 
[4] are easier to reach than those of the form x = [°] with a € R. On the other hand, 
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the output energy associated to states of the form x = [°] is significantly larger than 
the energy of the states x = [ 8 ]. The situation is different for the second system. Here, 
states of the form x = [§] are comparably easy to reach and, simultaneously, yield 
a large amount of output energy. It is thus natural to construct a reduced system by 
keeping the first coordinate while discarding the second one. As a surprising result, 
this yields the already discussed reduced transfer function G(s) = 5. Let us further 
emphasize that the Gramians X and Y of the second system are equal and diagonal, 
they are in balanced form. 

The example indicates that balanced systems significantly simplify the decision 
process of which states to discard. Let us thus, for now, assume that system (2.1) is 
given in balanced form such that for the Gramians 


X = Y = diag(oj,...,0,). 


We assume without loss of generality that the diagonal entries are ordered according 
to 0; => 0) 2 +- = 0,. Moreover, we focus on systems that are controllable and ob- 
servable, i. e., systems that satisfy o„ > 0. For obtaining a reduced-order system, we 
partition the system as follows: 


ee y Be fe | j A u(t), 


wH) lAa Azilo] LB ae 
E al i 
y(t)=[C, G] E + Du(t), 


where Ay ER Ap € RO Ay ERO ON, Ap eRe Be RO 
Bye R™™™, C] € RP” and C, € R*"”, The crucial observation now is that the 
balanced form allows us to compare states partitioned according to (2.17). Indeed, 
given unit vectors xy = €j, j < r, and Xq = e,, k > r, for the optimal controls u and u 
that steer the system (asymptotically) to xg and X, in infinite time, we find that 


1 


2 ee ae eee 
lult coor) = ej X ej = a eX ex = [tll coon"): 


1 
a) 
At the same time, for the associated output signals y and y, we have 
2 T T = 12 
IIVIIr,0,cosr?) =e; Ye; = Oj = OK = €k Ye, = IIVII7, (0,cosre): 


It therefore seems natural to discard states of the form x = [°] while keeping those of 
the form x = [§]. As a consequence, we obtain a reduced-order model 


X(t) = Ax (t) + Byu(o), 


os (2.18) 
y(t) = C,x,(t) + Du(t), 


where y is an approximation of y. 
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2.2.4 Theoretical properties 


Let us now summarize the most important properties of the reduced-order sys- 
tem (2.18). First of all, since the original model was assumed to be balanced, from (2.13) 
and (2.14) we know that the Gramians satisfy 


T T T 
f d p | A p | k a] z [a Fa -0 
T 
Al, Awl PY: 0 Y, 67 (Aue Ap] fee zi 
T yT\lo yitlo yY T 4t|t|er||er| =% 
Ay Ax 2 24 [Ay Ax Cz LC 
where X, = Y, = diag(o;,...,0,) and X, = Y, = diag(0,,;,...,0,). The eigenvalues 
O1»... On Of X and Y are called the Hankel singular values. In fact, they are singular 
values of the Hankel operator of the underlying system, see, e. g., [4]. Inspection of 


the (1,1) blocks of these matrix equations immediately shows that the Gramians of 
the reduced system (2.18) are given by X; and Y,, respectively. Hence, we obtain 


The reduced-order system is in balanced form. 


If we additionally assume that g, > 0,,,, i. e., there exists a true gap between the small- 
est diagonal entry of the Gramians X; = Y; and the largest diagonal entry of X, = Y>, 
it can further be shown, see [100], that 


the reduced-order system is asymptotically stable. 


Since the original system was assumed to be controllable and observable, all entries 
of X; and Y; are nonzero, i. e., 


the reduced-order system is controllable and observable. 


Moreover, an a priori error bound with respect to the H,,-norm can be given. 
For the original and reduced-order transfer functions G and G, we find that 


n 
IG-Gly <2 È o (2.19) 


i=r+1 


For a more detailed presentation and corresponding proofs, we refer to the original 
work [40, 48, 89, 90, 100] or, e. g., [4, Chapter 7]. 


2.2.5 Systems with inhomogeneous initial condition 


In the previous discussion, system (2.1) was assumed to have a homogeneous initial 
condition x(0) = O. If this is not the case, the Laplace transformation of the output 
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contains an additional term of the form C(sI SAX: We can thus define an augmented 
transfer function G(s) = C(sI —A)"|[B, Xo] + [D, 0] and subsequently apply the classical 
balancing method to this system. Alternatively, we can interpret the inhomogeneous 
system as the homogeneous one 


X(t) = Ax(t) + [B, xo] Ra , x(0)=0, 
(0) 
F u(t) | 
y(t) = Cx(t) + [D,0] hee 


where uy is considered to be a unit pulse input. This idea has been initially proposed 
and theoretically studied in [65]; see also [9]. A variation of this approach has re- 
cently been proposed in [8] and relies on the superposition principle for linear sys- 
tems. Based on partitioning the system response into an uncontrolled part with inho- 
mogeneous initial condition and a controlled part with homogeneous initial condi- 
tion, the method reduces the associated systems individually. 


2.2.6 Algorithmic realization 


Up to this point, for the construction of the reduced-order model (2.18), we as- 
sumed (2.1) to be in balanced form. A natural question arises how to obtain this 
balanced form for a general system. In this subsection, we discuss the computation 
of such a balanced realization which, subsequently, can be truncated to construct 
a balanced reduced-order model. 


Square root balanced truncation method 

The following method is typically referred to as square root balanced truncation and 
goes back to [79]. Let us analyze the effect of a coordinate transform x = Tx character- 
ized by a regular transformation matrix T € R™". Rewriting the dynamics with respect 
to the coordinates x, we obtain the equivalent control system 


X(t) = AX(t) + Bu(t), 


5 (2.20) 
y(t) = Cx(t) + Du(t), 


with A = TAT, B = TB and Č = CT™. The associated infinite-time controllability and 
observability Gramians are then given by 


ABR ef tdt = | Te“ T BB TTT Te% ‘TT at = TXT’, 


omg °g 


T Tei tT rll cr OTe T dt =T TYT. 
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Hence, we seek T such that TXT’ = T-YT~! = diag(o,,...,0,,). With this in mind, we 
assume that the symmetric positive definite Gramians X and Y are given in Cholesky 
form 


X=Lfly, Y= Lly, 


where Ly, Ly € R" are not necessarily upper triangular matrices. Let us further com- 
pute the singular value decomposition (SVD) of the product of the Cholesky factors, 
i.e., 


T T 
LyL} = UZZ', 


where U, Z € IR™" are orthogonal matrices and È = diag(o,,...,0,) > O is positive def- 
inite. By algebraic manipulations, the reader may verify that a balancing coordinate 
transformation is given by T = XZ TL. In particular, for the Gramians of the trans- 
formed system (2.20), we find that X = Y = ¥. From here, it is now possible to con- 
struct (2.18) by simply discarding the (n — r)-dimensional part of the balanced dynam- 
ics. However, from a numerical point of view, it is advisable to directly compute projec- 
tion matrices V and W that lead to a reduced system (2.2) which coincides with (2.18). 
Indeed, the balancing coordinate transformation T is ill-conditioned whenever some 
of the Hankel singular values are very small. Let us thus consider a partitioning 


T 

x, 07] }4 

LyLy = [U, vie Al a (2.21) 
21 |Z, 


where U}, Z} € R”’ and } € R’’. Corresponding to the Petrov-Galerkin framework, 
we now set 


Nie 


-Iy y? -IIZ y 
VLE WAL Ze: 
Again, by algebraic manipulations it can be verified that V and W satisfy WTV = I 


and that (A,;, B}, C D) = (WAV, W'B, CV, D). 


Linear matrix equations 

As we have seen, the square root balancing method relies on the computation of the 
Cholesky factors Ly and Ly, which, in turn, depend on the Gramians X and Y. In order 
to perform the transformation step, we thus have to compute X and Y either via their 
integral representations or as the solutions to the Lyapunov equations (2.13) and (2.14). 
Here, we focus on the latter approach. For approximation techniques based on quadra- 
ture, we refer to [113]. As a representative for both X and Y, consider a linear matrix 
equation for the unknown X of the form 


AX +XA' =G (2.22) 
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where A, G € R”. From an abstract linear algebra point of view, we can replace this 
matrix equation with an ordinary linear system of dimension n°. For this purpose, we 
recall the vectorization operator as well as the Kronecker product defined by 


T 
nxm nm 
vec: R®™ > R™, vec(A) > [an..., ân A2- - -> Apes Annl > 
auB ...  aimB 
@:R™™ x R” _, RYT, e(A,B)=A8B-= 
amB --.  anmB 


For matrices A, B and C of compatible dimensions, these operators are related via the 
formula 


vec(ABC) = (c7 @ A)vec(B). 


We can now apply the vectorization operator to both sides in (2.22) to obtain the equiv- 
alent linear system 


(I&A + AgTIec(X) = vec(G), (2.23) 


where I denotes the identity matrix of dimension n. While this yields the possibility 
to compute the solution by standard solvers for linear systems, the complexity now 
scales with the dimension n’, e. g., a standard LU decomposition will have complexity 
O(n’). For small to medium scale systems, the Bartels—Stewart algorithm [6] can be 
considered to be the method of choice. The main idea of this algorithm is first to trans- 
form the matrix A into a real Schur form Q7 AQ by means of an orthogonal transforma- 
tion Q € R™". The quasi-upper triangular form of the resulting matrix Q’ AQ can then 
be exploited to obtain a solution of (2.22). A detailed description of the method can 
be found in [6]. Note that a corresponding MATLAB implementation lyap is provided 
by the Systems and Control Toolbox. As a variant of the Bartels—Stewart algorithm, let 
us mention Hammarling’s method [62] which directly computes a Cholesky factoriza- 
tion X = ee Ly € R™", of the solution of (2.22) with G = -BB". Again, a MATLAB 
implementation lyapchol is accessible in the Systems and Control Toolbox. 

For systems resulting from a spatial semi-discretization of partial differential 
equations (PDEs), the previous methods are often not computationally feasible. As 
a remedy, in recent years, there has been an increasing interest in finding efficient 
approximation techniques for linear matrix equations of the form (2.22). Most of these 
methods can be categorized into Krylov subspaces methods, alternating directions 
implicit (ADI) based methods as well as iterative solvers that exploit the specific 
Kronecker structure. For a recent survey of low-rank methods for large-scale ma- 
trix equations, we refer to [28, 120]. A more detailed discussion with application to 
model reduction by balanced truncation can be found in [11]. Common for all these 
methods is that they rely on a low-rank approximation of the true solution which is 
known to exist for a large class of systems including parabolic second-order PDEs; see 
[5, 51, 57, 95, 98]. 


2 Balancing-related model reduction methods —— 27 


Pseudocode 

A pseudocode summarizing the classical balanced truncation method is shown in Al- 
gorithm 2.1. For large-scale systems, steps 1 and 2 should be replaced by an approxi- 
mation technique as described previously. Let us emphasize that the remaining steps 
remain unchanged when 


XSL yetily.. Y= LfLy = Lily 
are approximated by low-rank matrices with factors Ly € R™*" and Ly € R”, where 


kpk, x n. 


Algorithm 2.1: Balanced truncation for LTI control systems in MATLAB. 


Require: Original system A, B, C, D; error tolerance tol. 
Ensure: Reduced system A, B, C, D with ||G - Gl|,_, < tol. 


1: Compute the solution X = pis of (2.13). > LX = lyapchol (A,B) 

2: Compute the solution Y = ply of (2.14). > LY = lyapchol(A’,C') 

3: Compute the SVD (2.21) of LL}, > (U,Z,Z] = svd(LX*LY’) 
n 


where r is chosen such that 2 ) 9; < tol. 
i=r+1 


Nie 


=i = 
4: Define V = LXU,Z, ? and W =L}Z,z,?. 
5: Define A = W'AV,B = W'B,C = CV and D = D. 


2.3 Variants of classical balancing 


In this section, we present a brief survey on other balancing-related model reduction 
techniques which have been developed for various classes of control systems with dif- 
ferent control-theoretic properties. A key idea of all these methods is to define a pair of 
Gramians which characterize the inherent properties of the particular class of systems. 
Then the reduced-order model is obtained by transforming the system into a balanced 
form such that the Gramians of the transformed system are equal and diagonal and 
truncating the state components corresponding to the small diagonal elements of the 
Gramians. 

We start our consideration by introducing a concept of dissipativity which was ex- 
tensively studied by Willems [141, 142]. System (2.1) is called dissipative with respect to 
a supply rate w(u(t), y(t)) if there exists a non-negative function S: R” —> R satisfying 
the dissipation inequality 


ti 


S(x(to)) + | wwe, ye) dt > S(x(t,)) (2.24) 


to 
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for all tọ t, € R with t, > tọ and all u € L,(to,t,;;R™). The function S is called the 
storage function. The dissipation inequality implies that the increase in the internal 
energy described by the storage function S on the time interval (tg, t4) does not exceed 
the energy supplied to the system. By choosing the quadratic supply rate 


w(u(t), y(t) = Y OYE) + 2y" (t)Su(t) + u (A)Ru(t) (2.25) 


with Q = Q7 € RPY, S e R?! and R = R7 e R™", the dissipativity of (2.1) can be 

characterized in terms of the Kalman-Yakubovich-Popov lemma [3, 32]. If system (2.1) 

is minimal, i.e., it is controllable and observable, then the following statements are 

equivalent: 

1. System (2.1) is dissipative with respect to the supply rate w as in (2.25). 

2. There exists a symmetric, positive definite matrix Y € IR” such that the linear 
matrix inequality (LMI) 


ATY + YA-C'QC YB-C'(QD+S) 
0 2.2 
BTY -(@D+S)'C -R—D'oQD-D's—s'p|* (2.26) 


is fulfilled. 
3. There exists a matrix triple (Y, K, J) with K € R®”, J € R™™ and symmetric, posi- 
tive definite Y € R" satisfying the Lur’e matrix equation 


ATY + YA- CTQC YB - C™(QD +S) |- h ea (2.27) 
BTy-(@D+S)'c -R-D'aqn-p's-s™p|~ |J" | Ly" 


Note that, ifRy =R+ D'QD+D'S+S"Dis positive definite, then J is nonsingular and, 
hence, the Lur’e equation (2.27) can equivalently be written as the Riccati equation 


ATY + YA-C7QC + (YB - C7(QD +5S))R,(YB-C™(QD+S))’ = 0. 


In general, the solution of (2.26) is not unique. There exist, however, unique solutions 
Yimin and Ymax Such that O < Ymin < Y < Ymax for all symmetric solutions Y of (2.26). 
These extremal solutions can be used to characterize the required supply and available 
storage of system (2.1) defined as 


S,(Xo) = inf 
ueL,(-00,0;R”) 
X(—00)=0,x(0)=Xo 7 


w(u(T), y(T)) dr, 


Sq(Xo) = sup — | w(u(r), y(T)) dr, 
ueL,(0,00;R”) 


x(0)=Xy,X(co)=0 


og BO 


respectively. The required supply S, (Xo) describes the minimum amount of energy that 
has to be supplied to the system to steer it from the zero state into the state x(0) = Xo, 
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whereas the available storage S,(x)) determines the maximum amount of energy that 
can be extracted from the system starting at the initial state x(0) = x) and reaching 
the zero state. For these quadratic functionals, we find that 


S,(Xo) = Xx YmaxXo> S a(Xo) = x YminXo> 


see [142]. These energy representations suggest, just as in the classical balancing ap- 
proach, to take Y7}, and Ymin as a pair of Gramians and use them for balancing and 
truncation. Since the numerical solution of LMIs is, in general, much more expen- 
sive than that of Lur’e or Riccati equations, we restrict ourselves to the definition of 
the Gramians as solutions to matrix equations. Let us further emphasize that the dual 


Lur’e equation 


AX +XA™-BQB" XC" - B(DQ+S)" j: B T 


-(DQ+S)B? -Q-DRD!'-—pDs'-sp'}] |F] (F (2.28) 


for a matrix triple (X, L, F) withX € R™",L € R™ and F € R™! is also of great interest. 
One can show, e. g., [56, 92] that the extremal solutions of (2.27) and (2.28) satisfy the 
relations 


z -1 
Yin = Xm Ymax = Xmin- 


max? 


Thus, to avoid the explicit inversion of Ymax, we can balance the minimal solutions 

Ymin and X pin Of the Lur’e equations (2.27) and (2.28), respectively. In the following, 
we show that several particular choices of Q, S and R in (2.25) correspond to various 
physical properties of the control system (2.1) and lead to different balanced truncation 
methods that preserve these properties in reduced-order models. 


2.3.1 Positive real balancing 


First, we consider system (2.1) with m = p and set Q = 0, S = $lin and R = 0. Then the 
supply rate takes the form 


w(u(t), y(t) = y (Qu(t). 


In this case, system (2.1) satisfying the dissipation inequality (2.24) is called passive. 
Such systems play an important role in circuit theory and network analysis [3]; see 
also Chapter 5 of this volume. Passivity-preserving balanced truncation for such sys- 
tems was considered in [63, 101]. If system (2.1) is controllable, then the dissipation 
inequality (2.24) is equivalent to the condition 


t 
[r ouo) dr> 0 
0 
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which holds for all t > O and all u € L,(0, 00; R”); see [142]. This condition is often 
used as a definition of passivity. It is equivalent to the positive realness of the transfer 
function G(s) = C(sI - A) 'B + D meaning that G is analytic in C* and G(s) + GTE) >0 
for all s € C*; see [3]. This property is, further, equivalent to the solvability of the 
positive real Lur’e equations 


AX+XA™ xcC™-B LI [L] 
| CX-B" -D- p"! a= l zl en 
and 
ATY+YA YB-CT KT KT)" 
B'Y-c <p Spl [JT] (T 239) 


The positive real Gramians of system (2.1) are then defined as the minimal solutions 


PR PR 
X =Xmnm YOO = Yuin 

of these equations. In positive real balancing, system (2.1) is transformed into the co- 
ordinates such that X™ = Y™ = diag(o}®,...,0}%) with the positive real character- 
istic values o? R ordered decreasingly. Applying the square root balanced truncation 
method as in Algorithm 2.1 with X and Y replaced by X?* and Y”®, respectively, we get 


the reduced-order system (2.2) which is passive and satisfies the error bound 
a Ty T A eT: PR 
IG - Gly, <2(D+D") ING + D” ly IG+D' ly, X o 
i=r+1 


provided D + D’ is nonsingular; see [60, 92]. 


2.3.2 Bounded real balancing 


An important property of network systems in the scattering form [3] is contractivity. 
This property corresponds to dissipativity with respect to the supply rate 


w(u(t), y(t) = -y OYE) + ul Hult) 


obtained from (2.25) by taking Q = =); S =Oand R = Ip. The controllable system (2.1) 
is contractive if and only if its transfer function G is bounded real, i. e., G is analytic in 
C* and I- GT(6)G(s) > 0 for all s € C*; see [3]. Such systems are useful, for example, 
in L,-gain constraint controller design [50]. To verify contractivity, we can also use the 
bounded real Lur’e equations 


AX + XA" + BB" ae taal E z] [z] 
CX + DB" DD” -I | |F 
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and 


ATY + YA+C7C aad LS K ep 
Bry+D'C = =D'D-1} (FIUT 


Their minimal solutions define the bounded real Gramians 
x™ = Xmin> yrs Yinin- 


Transforming the LTI system (2.1) into a bounded real balanced form such that 
AP yk diag(o?*, oe oF) and truncating the states corresponding to small 
bounded real characteristic values oP? results in a contractive reduced-order model (2.2) 
satisfying the error bound 


n 
IG - Gl <2 } o”. 


i=r+1 


These properties of the bounded real balanced truncation method were proved in [94]. 

It is well known that the square transfer function G(s) is bounded real if and only 
if its Moebius transform given by Gy(s) = (G(s) -I)(G(s) +] Tis positive real [39]. Note 
that this transform coincides with its inverse, i.e., 


G(s) = (Gy(s) —1)(Gy(s) +1). 


This relation leads to another model reduction method which preserves contractivity 
(resp. passivity). It consists in computing a reduced-order system 


G(s) = (Gy(s) —D(Gy(s) +), 


where the approximation Gy(s) is obtained by the positive real (resp. bounded real) 
balanced truncation applied to the Moebius-transformed system G(s); see [108] for 
details. 


2.3.3 Linear-quadratic Gaussian balancing 


Although Lyapunov-based balanced truncation is a well-established model reduction 
method, it suffers from some limitations when applied to controller reduction. For 
such a problem, the LQG balanced truncation approach was developed in [72, 93, 137] 
which can also be applied to unstable systems and guarantees closed-loop stability 
with the reduced-order controller. This approach relies on the supply rate 


w(u(t), y(t) = y (Oy(6) +u (Ou(t) 
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obtained from (2.25) by setting Q = Is S = Oand R = Iņ. For the linear quadratic 
optimal regulator problem 


Jo) = min | wun), yo) dr, 
ueL(0,00;R™) 


x(O)=x, 0 
tim x(@=0 
the optimal cost is given by J(x9) = xe Y Xo, where (Y,,K,,L,) is the stabilizing solu- 
tion of the Lur’e equation 


e +YA+C™C YB+ A F A T 
B'Y+D'C I+D'D] Ly Ls" 


Furthermore, the stabilizing solution (X,,L,, F,) of the dual Lur’e equation 
” +XAT +BB" XC'+ i 3 H A g 
CX + DB" 1+DD' | ([F]|F 


can be used to describe the optimal filter cost. Note that the stabilizing solutions are 
characterized by the property that all finite eigenvalues of the pencils 


ee zl ee a 
-K, -J,, -C Fy 


have negative real part. The LQG Gramians can now be defined as 
LQG LQG 
X'S =x, YS -=y 


In LQG balanced coordinates, it holds X126 = y12S = diag(or°S, a6 gi Gy. where aa 
are called LQG characteristic values. The reduced-order model (2.2) is then determined 
by projection onto the subspace corresponding to r dominant LQG characteristic val- 
ues. Since (2.1) and (2.2) are not necessarily asymptotically stable, the H,,-norm of 
the error G - G is generally not defined. In [91], an error bound in the gap metric was 
presented. It is based on the normalized left coprime factorization G(s) = M!(s)N(s), 
where 


sI-A ay f ° 


[Nomo] = [0,1] | -C -F,] |D I 


is the stable rational function. The LQG balanced truncation can then be interpreted 

as the classical balanced truncation applied to the system [N, M] € Hoo. The result- 

ing reduced-order system [N, M] is stable and provides the normalized left coprime 

factorization of G, i. e., G(s) = M ‘(s)N(s). Then the gap metric error bound is given by 
as n a 

JIN, M] - N. Mla, <2 )) + 


7 
i=r+1 41 + (Go) 


see [91] for the proof. 
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2.3.4 Stochastic balancing 


Stochastic balanced truncation was first introduced in [38] for stochastic processes 
and further studied in [55, 56, 63, 136]. This approach belongs to a class of relative 
error methods which attempt to minimize the relative error |G! (G - Olla. It does 
not rely on a special form of the supply rate any more but rather on spectral factors as 
defined below. 

Let B(s) = Z(s) +Z! (-s) be the power spectrum of the positive real transfer function 
Z(s) = Cz(sI - A) Bz + Dz with a minimal realization (A, Bz, Cz, Dz). Furthermore, we 
consider the left spectral factor V(s) and the right spectral factor W(s) of ®(s) satisfying 


@(s) = V(s)V! (-s) = W!(-s)W(s). 


Solving the positive real Lur’e equations 


e PRAT KCL = Ba] p 2] 2] 7 eas 
C,X-BI -p,-D! Fz | Fz i 
and 
T 
p +YA YB,- cz] 7 | iH 6 
BZY -Cz -Dz-Dy PIE l 


for (X, Lz, Fz) and (Y, Kz, Jz), respectively, the spectral factors can be realized as 


V(s) = Cz(sI - A) 'Lz + Fz, (2.33) 
W(s) = Kz(sI - A) "Bz + Jz; (2.34) 


see [56] for details. Then the balanced stochastic realization is obtained by performing 
a state-space transformation of the realizations of Z(s), V(s) and W(s) such that the 
controllability Gramian of V(s) is equal to the observability Gramian of W(s). These 
Gramians solve the Lyapunov equations 


AX + XA? =-L7LI, (2.35) 
ATY + YA =-K}K;, (2.36) 


respectively. 

Depending on whether the system V(s) or W(s) has to be reduced, one obtains two 
different model reduction methods: left spectral factor balanced truncation and right 
spectral factor balanced truncation. In the first method, given a square nonsingular 
transfer function G(s) = V(s) as in (2.33), where all eigenvalues of A have negative real 
part, we first calculate its controllability Gramian X by solving (2.35). Then using (2.31), 
we find 


Bz = XC} +L,F}, Dz,+Dz = F,Fi. 


34 —— T.Breiten and T. Stykel 


Inserting these matrices into the Lur’e equation (2.32), we determine, finally, the min- 
imal solution Y nin Of (2.32). Note that this matrix is also the observability Gramian of 
W since it satisfies the Lyapunov equation (2.36). We use now the stochastic Gramians 


xix, yoy 


to define the stochastic balanced realization of G(s) = V(s) such that XS! and Y*! are 
equal and diagonal, i. e., XST = YST = diag(o;", a 02), Reducing this realization by 
truncating the states corresponding to small stochastic characteristic values a, we 
obtain an approximation G(s) which satisfies the relative error bound 


n 


IG (G-G)la,, < [| 


i=r+1 


1+ a," 
1-071 ©” 
derived in [56]. Note that the stochastic characteristic values satisfy 0 < op" < 1. More- 
over, if r exceeds the number of op" with a! = 1, then G(s) preserves zeros of G(s) 
in C*; see [55]. This property immediately implies that, if G(s) is minimum phase, i. e., 
it has no zeros in C*, then G(s) is also minimum phase. 

For given G(s) = W(s) as in (2.34), the stochastic balanced realization can be de- 
termined by first solving the Lyapunov equation (2.36) for Y and then computing the 
minimal solution X,,;,, of the Lur’e equation (2.31) with 


Cz = BLY +J4Kz}, D} +Dz=J}Jz 


obtained from (2.32). Balancing the Gramian pair (X,,;,, Y) and performing model re- 
duction, we obtain an approximation G(s) with similar properties as in the previous 
method. 

The balancing-related model reduction methods considered above require solving 
Lur’e or Riccati matrix equations. Numerical methods for large-scale Lur’e equations 
presented in [102, 103] are based on deflating subspaces of an associated even matrix 
pencil. Furthermore, an extension of the ADI method to Lur’e equations was proposed 
in [85]. For Riccati equations, many different numerical methods have been developed 
over the last 30 years; see [30]. Let us just mention some of them relying on a low-rank 
approximation: Krylov subspace methods [67, 121], Newton’s method [19, 24, 28], and 
ADI-type methods [12, 81, 84]. 


2.3.5 Singular perturbation approximation 


Another variant of classical balancing relies on partitioning the balanced system (2.17) 
into slow and fast dynamics. Instead of setting x, = 0 and obtaining the reduced-order 
model via (2.18), consider the dynamics of x, to reach a steady state such that x,(t) = 0. 
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Asaconsequence, the partitioning (2.17) allows to explicitly solve the second equation 
for the variable x,. In particular, we find that 


x(t) = -A5}Ay%,(t) - AZ Bult). 
Inserting this expression into the first equation leads to 
x(t) = (An - AA An x(t) + (By - AnA B,)uct), (2.37) 
H(t) = (Cy - C,A53Ay) x; (t) - C,A5;Bau(t) + Dult) 
from which we obtain another reduced-order model of the form 
A= (An z AnAnAn), B= B,- AAB», 
Ĉĉ =C- GAA», D = -CAB + D. 


Let us emphasize that, for a controllable and observable linear system (2.17), the ma- 
trix A» is guaranteed to be regular if £, and È, have no common diagonal entries; see 
[83]. It has also been shown in [83] that the reduced-order model (2.37) satisfies the a 
priori error bound (2.19). Let us also mention the interesting fact that singular pertur- 
bation approximation can be interpreted as classical balanced truncation applied to 
the reciprocal system 


A=A', B=A'B, C=cA". 


Moreover, for the transfer function of the original system, we find that 


G(0)=-[C, G] F: An) Fake: 


Ay, Ay By 
A -71A A3 B 
=-[C, cal meaai aein sai ie K Il ‘| +p 
-AyAyA AAA AAs + Ax | 1 Bo 
A-B 
E ~ =~ D 
IG] aes + ‘| y 


= -A'B + D = G(0). 


The above relation additionally implies that singular perturbation approximation 
yields a reduced-order model that is exact at the frequency s = 0. We refer to the 
original reference [83], where more details of the method can be found. 


2.3.6 Cross-Gramian balanced truncation 


With the intention of studying controllability and observability concepts at the same 
time, in [44] the authors introduced (for the SISO case) the so-called cross-Gramian 
foe) 
X,= | e“ BCe“ dt. 
0 
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Generalizations to symmetric MIMO systems were subsequently considered in [45]. 
Similar to the controllability and observability Gramians, the matrix X, can be shown 
to satisfy the Sylvester equation, 


AX,+X,A+BC =0. 


Moreover, the eigenvalues of the cross-Gramian are invariant under state space trans- 
formations and coincide with the Hankel singular values of system (2.1). As a conse- 
quence, it is possible to perform the balancing step with respect to X, instead of X 
and Y. The main advantage is that only one linear matrix equation has to be solved 
which allows for a computationally more tractable method; see, e. g., [7, 124]. Fur- 
thermore, the cross-Gramian and some empirical variants thereof can be computed 
by means of simulated trajectories of the system; see [68] for a more detailed overview 
and comparison. 


2.4 Balancing for DAEs 


Control problems governed by DAEs arise in a variety of practical applications includ- 
ing circuit simulation, computational electromagnetics, fluid dynamics, mechanical 
and chemical engineering; see [18, Chapters 2, 4 and 5] for practical examples. Unlike 
ordinary differential equations (ODEs), DAEs contain (hidden) algebraic constraints 
restricting the solution to a manifold. These constraints usually result from physical 
laws as, for example, conservation laws in incompressible (Navier—)Stokes equations 
and Maxwell’s equations, and Kirchhoff’s laws in network problems, or from geomet- 
ric and kinematic constraints in mechanical systems. For solvability of DAE control 
systems, it is required that the initial values satisfy certain consistency conditions im- 
posed by algebraic constraints and that the input function or some of its components 
are sufficiently smooth. In [31, 76, 77], different frameworks have been presented for 
structural analysis and numerical treatment of linear and nonlinear DAEs. Also, dif- 
ferent index concepts have been introduced there to characterize various structural 
properties of DAEs. In general, a high index characterizes the difficulty of analyzing a 
DAE theoretically and numerically. 
We consider a linear DAE control system, 


Ex(t) = Ax(t) + Bu(t), 


(2.38) 

y(t) = Cx(t) + Du(t), 
where £,A € R™", Be R”, C € RP” and D € RP. If E is nonsingular, this system 
can be transformed into the standard state-space form (2.1) by inversion of E. Con- 
trol systems of the form (2.38) with a singular matrix E are also known as descriptor 
systems, generalized state-space systems or singular systems. Model reduction of such 
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systems by balanced truncation was first considered in [99] and further improved in 
[82, 127]. We also refer to [29] for a comprehensive survey on model reduction of DAEs. 

Assuming that a pencil AE — A is regular, i. e., det(AE — A) + 0 for some À € C, 
we can introduce a transfer function for (2.38) given by G(s) = C(sE - A) 'B + D. Com- 
paring it with the transfer function (2.5) of the standard-state space system (2.1), one 
might think that all control-theoretic concepts and related model reduction methods 
for ODEs can be extended to DAEs just by replacing the identity matrix with E. How- 
ever, in practice, it does not work, since DAEs, especially higher index DAEs, exhibit 
much more complex behavior than ODEs. 

A useful tool for investigating the structural properties of linear DAEs is a Weier- 
strass canonical form. For a regular pencil AE — A, it is given by 


E=T, mw Olt gar T, (2.39) 
oN r l r 


where T;, T, € R”” nonsingular matrices and N € R"»*"~ is nilpotent with nilpotency 
index v defined as a smallest integer such that N” = 0. This quantity also defines 
the (differentiation) index of the DAE system (2.38). The eigenvalues of J € R are 
the finite eigenvalues of the pencil AE — A, and N corresponds to the eigenvalue at 
infinity. Using the Weierstrass canonical form (2.39), we can decompose (2.38) into the 
differential part (also called the slow subsystem) and the algebraic part (also called 
the fast subsystem). The differential part is in the ODE form and, therefore, it can be 
reduced in a standard way by balancing and truncation. On the contrary, the algebraic 
part determines a solution manifold which has to be preserved in the reduced-order 
model. This can be achieved if we only remove redundant equations and those state 
components which do not contribute to the input-output energy transfer. As a result, 
we get a minimal realization of the algebraic part. 

For the differential part, we introduce the proper controllability and observability 
Gramians X,, and Y,, as symmetric, positive semi-definite solutions of the projected 
continuous-time Lyapunov equations 


AX, E! + EXpA’ = -P)BB'P}, Xp, = P,XppPro 


(2.40) 
ATY E+E" YopA = —PICTCP,, Yp, = PI YpPp 


where 


I, Oļi al 0 
p= |" o T P, =T, K of 
are the spectral projectors onto the left and right deflating subspaces of the pencil 
AE — A corresponding to the finite eigenvalues. Such Gramians exist and are unique if 
the DAE system (2.38) is asymptotically stable, i. e., all finite eigenvalues of AE -A have 
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negative real part. The proper Gramians have a similar energy interpretation as the 
Gramians in the standard state-space case [127]. For the algebraic part, we define the 
improper controllability and observability Gramians Xj, and Yj, as symmetric, positive 
semi-definite solutions of the projected discrete-time Lyapunov equations 


AXimA™ = EXE" = -Q,BB' Q], Xim = Q,XimQ » 


(2.41) 
A YimA T E Y mE = -0;C"CQ,, Yim = QI YimQ 


where Q; = I — P, and Q, = I - P, are the spectral projectors onto the left and right de- 
flating subspaces of AE -A corresponding to the eigenvalue at infinity. These two pairs 
of the Gramians provide two sets of Hankel singular values of the DAE system (2.38). 
Let 


ree gt 
Xpr = Ly pL x pr Yor z Ly prlY pr 


T T 
Xim = Lx,imLX,im Yim = LY, imLY,im 


be the Cholesky factorizations of the Gramians. Then the proper Hankel singular 
values, denoted by o), are defined as the largest n; singular values of the matrix 
Ly prE "LY yr and the improper Hankel singular values, denoted by 6;, are defined as 
the largest n,, singular values of the matrix Ly imA" LY im: The SVDs of these matrices 
can be used to transform (2.38) into a balanced form such that the Gramians of the 
transformed system satisfy 


Xpr + Xim = Ypr + Yim = diag(Oy,..-s On, Op --++8n,,): 


Then a reduced-order model can be computed by truncation of the state components 
of the balanced system corresponding to small proper Hankel singular values and 
zero improper Hankel singular values. Note that the truncation of the states corre- 
sponding to non-zero improper Hankel singular values may lead to an inaccurate and 
physically meaningless approximation; see [82, 130] for some examples. An extension 
of the square root balanced truncation method to DAE control systems is presented in 
Algorithm 2.2. 

One can show that the resulting reduced-order system is balanced, asymptotically 
stable and has an index which does not exceed the index of the original system [127]. 
Moreover, we have the error estimate 


Nr 
IG-Glly <2 È o 


i=rp+1 


For solving the projected continuous-time Lyapunov equations (2.40), one can use the 
low-rank generalized ADI method [129] or (rational) Krylov subspace methods [131]. 
The projected discrete-time Lyapunov equations (2.41) can be solved using the gen- 
eralized Smith method [129]. The main difficulty in all these methods is the determi- 
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Algorithm 2.2: Balanced truncation for DAE control systems. 


Require: Original system E, A, B, C, D; error tolerance tol. 


Ly pr Of (2.40). 
Ly im of (2.41). 


1: Compute the solutions X,; = Ly prLXpr and Y,, = Ly wr 
2: Compute the solutions Xim = LY imExX.im and Yim = L} 


Y,im 
3: Compute the SVDs 


T 
x, 07) Zot 
a 1 pr, 
Ly prE Ly pr = [Upra Upr,2] E z. ea ; 


T 
TT ©, 0] | Zima 
Ly imA Ly im = Vina Uim2] | 5 sl | i | > 


where 2, = diag(oj,... 0;,) and È, = diag(o,, Mises On,) with rẹ chosen such that 


ny 
2 } o; < tol, and 0; = diag(6,,..., 6...) is positive definite. 
i=rp+1 
4: Define 


1 1 
T T3 T 73 

V= [Ly pr Upr 12 Z Ly im Vim, *], 
1 1 
T 73 T =5 

W= [LY prZpr,121 E Ly imZim19, ; ]. 


5: Define Ê = W'EV, A= WAV, B = WTB, Ĉ = CV and D = D. 


nation of the spectral projectors P, and P,. For some special classes of linear DAEs 
such as semi-explicit DAEs of index 1 [46, 111], magneto-quasistatic systems of index 1 
[73], Stokes-like DAEs of index 2 [66, 128], and constrained mechanical systems of in- 
dex 2 and 3 [29, 114], it is possible to avoid the explicit construction of the projectors. 
By making use of a special block structure of the system matrices, such systems can 
be transformed into the ODE form allowing the application of standard model reduc- 
tion methods. It should, however, be emphasized that the resulting ODE systems do 
not preserve the sparsity in the matrix coefficients. Therefore, the ODE systems will 
never be computed explicitly. Instead, again exploiting the system structure, all com- 
putations can be performed in terms of the original data. For other structured DAEs 
such as circuit equations of index 1 and 2, the required projectors have been derived 
in [107, 109]. In numerical implementations, however, only projector-vector products 
will be computed, avoiding explicitly forming the (possibly full) projector matrices 
and significantly reducing computational costs. 

We conclude this section by referring to [29, 87, 108] for an extension of other types 
of balancing-based model reduction methods to DAE control systems. 
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2.5 Balancing for nonlinear systems 


A generalization of the concept of balancing for the nonlinear case has been intro- 
duced in [117]. For summarizing the main idea, consider a nonlinear system of the 
form 


x(t) = f(x(t)) + g(x(O)u(t), 


(2.42) 
y(t) = h(x(t)), 


where f: IR” — R”, g:IR” > R™" and h: IR" — R” are smooth functions. Additionally, 
zero is assumed to be an equilibrium, i.e., f(0) = O and h(0) = O. In analogy to the 
linear case, Scherpen defined the energy functionals 


0 


1 
L-(xq)= min ł | ul (t)u(t) dt, 
uéL,(—00,0;R™) 2 
Beans ane Size (2.43) 
Lo) = 5 [rovo dh. aoee OETA 
(0) 


which are defined to be infinite if xg cannot be reached from 0 or if the system is un- 
stable, respectively. While the Gramians X and Y ofa linear system satisfy the linear 
matrix equations (2.13) and (2.14), under suitable solvability assumptions L, and Lo 
solve the partial differential equations 


aL. 10L, aL. j 
<< (ofly + 552 00800( F280) z0- Le, 


Tof + Eho) h= 0, L9(0) = 0, 


for all x in a neighborhood of the origin; see [117, Theorem 3.2]. The idea of trans- 
forming the system coordinates into balanced form and subsequently obtaining a 
reduced-order system by truncation remains the same. However, in general this can 
only be achieved locally (around the origin) by a nonlinear coordinate transform 
x = (X), W(O) = 0. For the transformed balanced functionals L, and L,, it then holds 


where the so-called singular value functions 0;(X) 2 --- > 0,,(X) extend the notion of 
the Hankel singular values to the nonlinear setting. In particular, it can be shown that 
a linearization of this approach yields the classical balancing technique applied to the 
linearization (around 0) of the nonlinear system (2.42). For more details on theoretical 
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properties such as local asymptotic stability or preservation of balanced coordinates, 
we refer to the original reference [117]. 

Since computing L. and L, as the solutions (or approximations thereof) of (2.43) 
suffers from the curse of dimensionality, recent work has focused on replacing L, and 
L, by algebraic approximations. Particularly for the class of bilinear control systems 


X(t) = Ax(t) + ¥Nix(t)u;(t) + Bu(t), 
i=l (2.44) 


y(t) = Cx(t), 


with A, N; € R™",B € R™™” and C € RP”, below we summarize an alternative way of 
generalizing model reduction by balanced truncation. Let us emphasize that bilinear 
systems not only arise in several practical applications such as nuclear fission, biology 
[88], the Fokker—Planck equation [64] and heat transfer processes [41], but can also 
be used to approximate more general nonlinear systems by a Carleman linearization; 
see [112]. Controllability and observability concepts for bilinear control systems have 
already been studied in [35, 71]. Their use in context of model reduction has been first 
discussed in [1, 2] and is based on a generalization of the integral representations (2.9) 
and (2.12). In particular, consider the following recursive series of time-dependent ma- 
trices: 


X,(t,) = e^"B, 


Xi (ty, ...sty) = [eN Xp ... COHN X, 1], k=2,3,..., 
as well as 
Y,(t,) = Ce*”, 
Vp (tye -sty) = [YNT YpaNmeQ2yT)]), k23., 
and define 


Xb tO Get de edt 
(2.45) 
YE cost Vilteast) dhe dt- 


If X and Y exist, it can be shown, see, e. g., [1, Theorem 1], that they satisfy the gener- 
alized Lyapunov equations 


m 
AX + XA" +) N;XN; + BB’ = 0, 
i=1 


(2.46) 


m 
A'Y +YA+ Y N YN; +C'C =0. 
i=1 
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The main advantage of using these algebraic Gramians X and Y is that a global static 
coordinate transform X = Tx can be used to transform (2.44) into a balanced form. 
The construction is completely analogous to the linear case, i. e., the transformation 
matrix T € R™" can be obtained via T = 73Z Ly, where 


X=Lily, Y=LLy, Lgl} =UXZ". 


We summarize the main steps of balanced truncation for bilinear control systems in 
Algorithm 2.3. 


Algorithm 2.3: Balanced truncation for bilinear control systems. 


Require: Original system A, B,C and Nj,...,; Nm. 

Ensure: Reduced system A, B, C and Ñ}... Npn. 
1: Compute the solutions X = ile and Y = nee of (2.46). 
2: Compute the SVD of eu as in (2.21). 
3: Define V = LRA and W =L7Z,2,7. 
4: Define A = W'AV, B = W'B, C = CV and Ñ; = W'N,V,i=1,...,m. 


Nie 


A few remarks on the properties of the method are in order. The quadratic forms 
x'X-lx, x’ Yx and some related variants have been compared to the energy function- 
als L, and L, defined in (2.43) in several publications, e. g., [13, 52, 53, 54]. In [13, 
Propositions 3.5 and 3.8], it is shown that the input energy L, (resp. the output en- 
ergy L,) can locally (around the origin) be bounded from below (resp. from above) 
by means of x’ X~!x (resp. x’ Yx). Similar to the linear case, preservation of system 
stability is guaranteed for the reduced-order model; see [14]. Let us mention that the 
notion of stability used in the latter work is based on the eigenvalues of the gen- 
eralized Lyapunov operator and is stronger than asymptotic stability of the system 
matrix A; see [36]. Recently, a further pair of generalized algebraic Gramians has been 
suggested in [104]. These Gramians are inspired by similar quantities introduced in 
the context of model reduction of linear stochastic systems in [37]. In contrast to (2.46), 
here nonlinear matrix inequalities have to be solved. The benefit of these Gramians 
is that an Ha -type error bound can be shown; see [104, Theorem 4.1]. However, from 
a computational point of view, even for large-scale systems, it is easier to solve gener- 
alized matrix equations of the form (2.46). For references on (low-rank) approximation 
techniques, we refer to [10, 11, 36, 75]. 

As a further field of current research, we mention balancing-based model reduc- 
tion for quadratic—bilinear control systems of the form 


X(t) = Ax(t) + H(x(t) @ x(t)) + X Nix(t)uj(t) + Bult), 
i=1 


y(t) = Cx(¢), 
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where A, N;, B and C are as before and, moreover, H € R”, The interest in such sys- 
tems goes back to the results obtained in [58, 59], where the author shows that a cer- 
tain class of smooth, nonlinear, control affine systems can equivalently be expressed 
in such a form. In [15], the authors follow an approach that is similar to the bilinear 
version described above. In particular, a pair of nonlinear coupled generalized Lya- 
punov equations 


m 
AX + XA" + H(X @X)H' + Ý N;XN; + BB" = O, 
i=1 


m 
ATY + YA + HOX ¥)(H)! +Y NTYN; +C7C =0 
i=1 


is derived and also compared to the energy functionals L, and L,. Here, HP? denotes 
the mode-2 matricization of the three-dimensional tensor H € R™”*" associated to 
the matrix H € R”, Moreover, an approximation procedure based on a series of 
generalized linear matrix equations is discussed in [15] and numerically investigated 
for several nonlinear PDEs. 


2.6 Other balancing issues 


2.6.1 Balanced truncation for discrete-time systems 


The balanced truncation model reduction method can also be formulated for discrete- 
time control systems of the form 


Xk+1 = AX, + Bux, (2 47) 
Yk = Cx, + Dux, 


with the state x,, control ux and output yx. For such systems, instead of the continuous- 
time Lyapunov equations (2.13) and (2.14), one has to solve the discrete-time Lyapunov 
equations 


AXA -X = -BB!, ATYASY = eC'C (2.48) 


for the controllability and observability Gramians X and Y provided all eigenvalues 
of A lie inside the unit disc. Then the reduced-order model is computed analogously 
to the continuous-time case, by balancing X and Y and truncating the states corre- 
sponding to small eigenvalues of the Gramians. The preservation of stability and error 
bound similar to (2.19) were proved in [69, 100]. Other balancing-related techniques 
for discrete-time systems were considered in [38, 92]. 
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The Lyapunov-based balanced truncation approach was also extended to periodic 
discrete-time systems in [42, 133, 134, 135, 136]. The Gramians for such systems can 
be determined as solutions to the periodic discrete-time Lyapunov equations. Using 
a lifted representation [125] for the periodic system, these equations can be written in 
the form (2.48) with block structured matrices A, B and C. Efficient numerical meth- 
ods for such structured equations were presented in [74]. They exploit the block spar- 
sity in the lifted system matrices and rely on low-rank techniques. Model reduction by 
balanced truncation for (periodic) discrete-time descriptor systems was considered in 
[20, 21, 34, 126]. 


2.6.2 Balanced truncation for second-order systems 


Model reduction of second-order control systems has received a lot of attention be- 
cause of their importance in structural mechanics, acoustics and vibration problems; 
see [18, Chapters 2 and 3]. Balanced truncation for such systems was first considered 
in [86] and then further investigated in [27, 33, 106]. We consider the second-order 
system 


Mq(t) + Dq(t) + Kq(t) = Bult), 


C(t) + Cog(t) = yd), (2.49) 


where M, D,K € R™" are the mass, damping and stiffness matrices. This system can 
be written as a first-order system 


Ex(t) = Ax(t) + Bu(t), 


y(t) = Cx(t), (2.50) 
with x = [q" q"]’ and 
N O (0) N 0 
a Mi a= op B= |g C=[Co G], (2.51) 


where N is an arbitrary nonsingular matrix. The controllability and observability 
Gramians of this system solve the generalized Lyapunov equations 


EXA! + AXE’ = -BB!, E'YA+A'YE=-C'C. (2.52) 


Applying the balanced truncation method to (2.50) as described in Section 2.2, we ob- 
tain a reduced first-order model which, in general, cannot be turned into the second- 
order form. Ensuring the second-order structure in the reduced model often guaran- 
tees the preservation of the physical properties and allows the use of software tools 
specially developed for second-order systems. 
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Thus, we partition the Gramians X and Y into n x n blocks as 


kal a a | 
Xp XJ’ Yo Yj 


Then X, and Y, define the position controllability and observability Gramians, and X, 
and Y, define the velocity controllability and observability Gramians of the second- 
order system (2.49). We refer to [33, 86] for the energy interpretation of these Grami- 
ans. By balancing one of the pairs (Xp, Yp), (Xp, Yv), (Xv, Yp) or (Xy, Y,), we get the 
position—position, position—velocity, velocity—position or velocity—velocity balanced 
realizations, respectively. Considering the Cholesky factorizations 


T T T T 
Xp E Ly pLxp X% = Ly yLxy Y, T Ly pLy p Y, = Ly Ly y 


the corresponding balancing transformation matrices can be determined from the 
SVD of the matrices Ly pLY ys Ly,M TLE p, LyyLy » and Ly ,M'Ly, ,; see [106]. This leads 
to four different second-order balanced truncation approaches which provide the 
reduced model in the second-order form, 


Mat) + DAA) + RAA) = Bou(t), 
Câ (t) + CoG(t) = P(E), 


where M = W'MV, D = W'DV, R = W'RKV, B, = W'B,, Cy = W'Co, and C, = WC, 
with appropriate projection matrices W and V. Unlike the first-order balanced trun- 
cation, stability is not necessarily preserved in the reduced model, see [106] for some 
examples, and there exists no error bound. However, for symmetric second-order sys- 
tems with M = M” > 0,D = D" > 0,K = K" > 0,Cy = Oand G = Al. it was 
shown in [22] that choosing N = -K in (2.51) one can guarantee the preservation of 
stability and symmetry for the position-position and velocity-velocity balanced trun- 
cation methods. Moreover, for N = I, the position-velocity balanced truncation also 
preserves stability and symmetry in the reduced-order model [106]. Finally note that 
the low-rank Cholesky factors of the position and velocity Gramians can be computed 
using the ADI method applied to (2.52) without explicit forming the double sized ma- 
trices E, A, Band C as in (2.51); see [22, 27] for details. 


2.7 Numerical examples 


In this section, we present two numerical examples which illustrate the applicability 
and the limitations of classical balanced truncation. 
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2.7.1 Heat equation 


As a first example, let us focus a one-dimensional linear heat equation with boundary 
control. In particular, we consider 


Xt = Xg» (č, t) € (0, 1) x (0, T), 
x(0, t) = 0, t € (0,T), 
xe(1, t)=u(t), te(0,T), 
x(&,0) = 0, č e (0,1), 
where x(č,t) describes the evolution of a temperature distribution on the interval 


[0, 1]. We assume that the average temperature can be measured such that the output 
variable y(t) is given by 


1 
y(t) = [xe thdé fort>0. 
ò 


A finite difference discretization on a grid with n interior grid points and grid size h = 


na then yields a finite-dimensional control system (2.1) with system matrices 


2 1 (0) 1 
1 -2 1 : 
A-5 ts, dpe , C=h 
1 -2 1 0 
1 -1 1 ; 


In our results, we have used a discretization with n = 5000. According to the pseu- 
docode from Algorithm 2.1, we employed the MATLAB routine lyapchol to com- 
pute the solutions X and Y of the controllability and observability Lyapunov equa- 
tions (2.13) and (2.14). The numerical rank of the matrix LyL% in this case was only 
11. Hence, a numerically (up to machine precision) exact copy of the original system 
can be realized by a system of dimension r = 11. This is confirmed by the results from 
Figure 2.1. The so-called Bode plot of the reduced transfer function G(s) = Ĉ(sI- A) 1B 
cannot be distinguished from the original transfer function G(s) = C(sI - A) 'B. A 
similar conclusion can be drawn for the impulse response functions h(t) = Ce“‘B and 
h(t) = Ce“‘B presented in Figure 2.2. Figure 2.3 compares the H,.-error corresponding 
to balanced reduced-order systems of dimension r = 1,...,11 with the guaranteed 
error bound determined by the neglected Hankel singular values. We emphasize that 
the heat equation and, more general, parabolic PDEs with finite-dimensional input 
or output space are well-suited for model reduction by balanced truncation. For more 
details on a functional analytic discussion, we refer to [96]. 
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Bode plot 


—— Original system, n = 5000 
—— Reduced system, r = 11 


10° 
Ww 


Figure 2.1: Bode plots for the 1D heat equation. 
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— Original system, n = 5000 
—— Reduced system, r = 11 
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Figure 2.3: H -error for varying reduced system dimensions r for the 1D heat equation. 
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2.7.2 Transport equation 


The second example is a one-dimensional boundary controlled transport equation 
and has been discussed in more detail in [96]. Let us consider 


Xt = Xe, (é, t) € (0, 1) x (0, T), 
x(0,t) =u(t), te(0,T), 
x(€,0)=0, &€(0,1). 


We assume that the right boundary can be observed. Hence, we obtain the output vari- 
able y(t) = x(1, t) for t > 0. 

For the spatial discretization, we employ an upwind scheme and thus use a back- 
ward finite difference for the approximation of the transport term. We consider a grid 
with n grid points equally distributed over the interval [h,1] where h = E, Note that 
in contrast to the first example, we only eliminate the left boundary value which is 
prescribed by the dynamics. As a result, we obtain the following system of matrices: 


-1 1 (0) 

1/1 -1 110 
A=7 a. a BS ee. eeN 
h be ee h |: 0 
1 -1 (0) 1 


As mentioned in [96], the transfer function of the above transport equation is 
Girue(S) = e*. Note that this irrational transfer function is approximated by rational 
transfer functions of the form G(s) = C(sI — A)'B. In Figure 2.4, the Bode plots of the 
original and reduced transfer functions are shown. Note that the spatial discretiza- 
tion and its realization (A, B,C) already introduces a visible error. In contrast to the 
analytic expression Grue(S) = e~* which has constant modulus 1 along the imaginary 
axis, the magnitude of G(s) = C(sI — A) 'B decreases for increasing frequencies. Let us 
emphasize that transport equations as the one from above are generally less suitable 
for model reduction by balanced truncation. This is also apparent from the relatively 
large reduced system dimension (r = 180) of the numerically (up to machine precision) 
exact copy of the original semi-discretized system. 

Since the original transfer function is given by Gyue(S) = €, its underlying im- 
pulse response is h(t) = 6(t-1), i. e., a unit pulse mass at 1. Again, the results presented 
in Figure 2.5 underlines that the spatially discretized system only approximates the 
original dynamics. Nevertheless, the impulse responses h(t) = Ce““B and h(t) = ĉe“ B 
are almost identical. Finally, Figure 2.6 shows the decrease of the H -error for increas- 
ing reduced system dimension r, together with the available a priori error bound. We 
observe that the approximability is worse than in the case of the heat equation. A the- 
oretical explanation can be given in terms of the decay rate of the Hankel singular 
values of the system; see [96]. 
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Bode plot 


Figure 2.4: Bode plots for the 1D transport equation. 


Impulse response 


Figure 2.5: Impulse responses for the 1D transport equation. 
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Figure 2.6: H -error for varying reduced system dimensions r for the 1D transport equation. 
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2.8 Conclusions 


This chapter provided an introduction to the general concept of system balancing. The 
focus was on classical balanced truncation and some of its most common variants. 
Based on an illustrative example, the main idea was derived and also shown to be ap- 
plicable to spatially semi-discretized partial differential equations. While the meth- 
ods were mainly discussed by means of finite-dimensional continuous-time control 
systems, several generalizations to other system classes exist and can be found in the 
literature. We refer to [18, Chapter 13] for a review of existing software tools implement- 
ing balanced truncation model reduction algorithms. With increasing complexity and 
available data, the need for efficient implementations is still an active research topic. 
In particular, so-called data-driven methods as presented in [17, Chapter 7] seem to 
become an indispensable part for future methods. 
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Abstract: This is a survey of model order reduction (MOR) methods based on moment- 
matching. Moment-matching methods for linear non-parametric and parametric sys- 
tems are reviewed in detail. Extensions of moment-matching methods to nonlinear 
systems are also discussed. Efficient algorithms for computing the reduced-order mod- 
els (ROMs) are presented. 
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3.1 Introduction 


In this chapter, we focus on linear time-invariant (LTI) systems 


a H) = A(wx(t, u) + B(w)u(t), 


y(t, uw) = CQ) x(t, y) + Dut), 


E(u) 


(3.1) 


and mildly nonlinear systems 


al, Hl f(x(t, u), u) + B(u)u(t), 


y(t, y) = C(w)x(t, y) + Dutt), 


E(u) 


with and without parameters. Here x(t,u) € R” is the state vector, and its entries 
are called state variables. n is often referred to as the order of the system. The vector 
u € R” includes all of the geometrical and physical parameters. The system matri- 
ces E(u), A(w) € R™", and B(u) € R®™, C(u) € R"*", Diu) € IR%*" may depend on 
the parameters. The vector f(x, 4) € R” is a nonlinear function. The system in (3.1) is 
called the state-space representation of the system. It may result from the spatial dis- 
cretization of partial differential equations (PDEs) describing certain processes like 
fluid dynamics, temperature distribution in devices, electric circuits, etc. 

For most MOR methods, the term D(w)u(t) remains unchanged during the process 
of MOR. For simplicity, we therefore assume that D(w) = 0, a zero matrix. There are 
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n; input terminals and ng output terminals. When n; = nọ = 1, the system is called 
a single-input single-output (SISO) system. Otherwise, if ny, no > 1, it is called a multi- 
input multi-output (MIMO) system. 

The basic idea of (P)MOR methods is as follows. Find a low-dimensional trial sub- 
space S, which well approximates the manifold where the state vector x(t, u) resides. 
x(t, u) is approximated by a vector X(t, u) in S}, which causes a residual of the state 
equation. The reduced-order model (ROM) is obtained by a (Petrov)—Galerkin projec- 
tion of the residual onto a test subspace S. In particular, one computes an orthonor- 
mal matrix V = (v4, v>,...,V,) whose columns span S,. The ROM is derived by the fol- 
lowing two steps. 

1. By replacing x(t, u) in 3.1) with Vz(t, u), we obtain 


wate) = A(u)V2(t, u) + BQwuct), 


y(t) = C(w)V z(t, y). 


Et) (3.2) 


2. Notice that the equations in (3.1) do not hold any longer. Therefore, we can only 

use “=” in (3.2). Denote the residual as e(t, y) = AVz(t, y) + B(w)u(t) - E(w) a P 
which in general is nonzero over the wole vector space R”. However, it is possible 
to force e = 0 in a properly chosen subspace S, of R”. If we have computed a 
matrix W € R™*’, whose columns span S,, then e = 0 in S, means e is orthogonal 
to each column in W, i.e. W'e = 0 ==> ed eae = WTAVz(t, u) + W'Bu(t). 
Finally, we obtain the ROM 


ZOW = A(wzit,w) + Bult), 


ĝ(t, u) = Ĉz(t, u), 


Eq) (3.3) 


where E(w) = WTE(W)V € RY, AQ) = WTA(QW)V € RY, Biv) = WTB) € R", 
Ĉu) = C)V € R". z(u) € R” is a vector of length r « n. Then x(t, u) can be 
approximated by x(t, u) ~ Vz(t, u). The system in (3.3) is referred to as the reduced- 
order model (ROM), since it is of much smaller order than the original system in (3.1), 
i.e.r <n. The ROM can then replace the original system for fast simulation. 

MOR methods differ in computing the two matrices W and V. One common goal of 
all methods is that the input-output behavior of the ROM should be sufficiently “close” 
to that of the original model. The error between the transfer functions (see (3.6)) is also 
used to measure the accuracy of the ROM. 

Moment-matching relates to a class of methods which construct the ROM by 
building the projection matrices W, V from the system information in the frequency 
domain. The early moment-matching methods are only applicable to linear non- 
parametric systems. Later on, these methods were extended to linear parametric 
systems. Based on nonlinear system theory [52], multi-moment-matching methods 
based on variational analysis were proposed and are successful in reducing weakly 
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nonlinear systems. In contrast to the snapshot based time-domain methods, e.g., 
proper orthogonal decomposition or the reduced basis method, moment-matching 
methods can be considered as frequency-domain methods, and are independent of 
the inputs. Therefore, these methods are robust for systems with varying inputs. The 
rest of this chapter is organized as follows. In Section 3.2, we introduce moment- 
matching methods for linear non-parametric systems, where methods based on ratio- 
nal interpolation are particularly discussed. The extension of those methods to linear 
parametric systems is introduced in Section 3.3. Methods based on moment-matching 
and multi-moment-matching for nonlinear systems are reviewed in Section 3.4, and 
their extension to parametric nonlinear systems is discussed in Section 3.5. Conclu- 
sions are drawn in the end. 


3.2 Moment-matching for linear non-parametric 
systems 


This section reviews moment-matching methods for linear non-parametric systems, 
so that the vector of parameters u can be dropped from the system (3.1). Among the 
early works of moment-matching MOR, the method of Asymptotic Waveform Evalua- 
tion (AWE) in [48] was shown to be able to reduce large-scale interconnected electri- 
cal circuit models, which stimulated broad interests in this kind of methods. The AWE 
method tries to find a Padé approximation of the transfer function H(s), which can be 
computed much more quickly than computing H(s) itself. 


Transfer function 

For all the methods introduced in this chapter, the transfer function of the system is 
used to either derive the ROM, or to perform the error estimation. The transfer func- 
tion of the system in (3.1) is the input/output relation of the system in the frequency 
domain. By applying the Laplace transform to both sides of the equations in (3.1), we 
obtain 


sEX(s) — Ex(0) = AX(s) + BU(s), (3.4) 
Y(s) = CX(s). (3.5) 


Here, X(s) is the Laplace transform of x(t), and x(0) is the initial state of the system. 
Assuming that x(0) = 0, we obtain the expression for the transfer function 


H(s) = Y(s)/U(s) = C(sE - A) 'B, (3.6) 


where the right division “/” has to be understood in a formal way for MIMO systems. 
For a SISO system, the transfer function H(s) is a scalar function. The Padé approxi- 
mation of a scalar function can be defined as follows. 
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Padé approximation 
The Padé approximation of a function H(s) is a rational function H PO) whose Taylor 
series at s = 0 agrees with that of H(s) in at least the first p + q + 1 terms [21]. 

For a MIMO system, the transfer function H(s) is a matrix function and each entry 
ofit can be approximated by the above Padé approximation. For clarity ofexplanation, 
we use a SISO system as an example to briefly describe the method. 

From the definition of the Padé approximation, we know that, if H(s) = Apq(So + 
o) = P„(0) / Q, (a) is a Padé approximation of the transfer function H (sọ + o), then we 
have 


Hyq(So + 0) = H(So + 0) + O(a? *4*1). (3.7) 


The derivatives of H(Sọ + 0) at ø = O are actually the derivatives of H(s) at s = Sọ, they 
are also the moments m;(Sọ), i = 0,1,..., (see the definition in Section 3.2.1.1) of the 
transfer function. By definition, the Padé approximation H,, ,(So + 0) matches the first 
p+q+1moments of the transfer function. 

If the coefficients of the two polynomials P,(0) and Q,(@) in Hyg(So + 0) are com- 
puted, then H, 4(So + 0) is obtained. The coefficients can be obtained by solving two 
groups of equations which are derived from equating the coefficients of the Taylor se- 
ries expansion (at ø = 0) on both sides of (3.7). 

Since the moments are the coefficients of the Taylor series expansion of H (Sọ + 0) 
at ø = 0, they are involved in solving the equations to obtain the coefficients of P,,(0) 
and Q, (a). However, in the AWE method, the moments are computed explicitly, which 
can cause serious numerical instability. 

In order to overcome the numerical instability of AWE, a more robust method 
“Padé via Lanczos” (PVL) [21] (see also [32]) was proposed. PVL also computes the 
Padé approximation of H(s) = H(s, + 0), however, the moments of H(s) do not have 
to be computed explicitly. Instead, an orthonormal basis of the subspace spanned by 
the moment vectors is computed, which constitutes the projection matrix V, and the 
projection matrix W is also computed simultaneously. Both of them are computed by 
the nonsymmetric Lanczos process. It is proved in [21] that the transfer function of the 
ROM produced by W and V is the Padé approximation of the original transfer func- 
tion H(s). The PVL method avoids explicit computation of the moments, and therefore 
avoids the possible numerical instability. 

Unfortunately, PVL does not necessarily preserve passivity of the original system, 
which is a problem in some engineering applications, especially in Integrated Cir- 
cuit (IC) design. For this target, the method “Passive and Reduced-order Interconnect 
Macromodeling Algorithm” (PRIMA) [45] was proposed. The resulting ROM preserves 
the passivity of the original system, under certain assumptions on the system matri- 
ces. The trade-off is that only half the number of moments can be matched by PRIMA 
as compared to PVL, if the matrix V in both methods expands the same subspace. 
Such an approximation of the transfer function is called a Padé-type approximation. 
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The rational interpolation method proposed in [35] extended the Padé-approxi- 
mation method and the Padé-type-approximation method based on a single expan- 
sion point to the case of multiple expansion points. All these methods can be called 
moment-matching methods, because all the ROMs match the moments of the original 
system to different extents. For survey papers on moment-matching model reduction; 
see [4, 31] and [3, 35]. 

Moment-matching MOR methods try to derive a ROM whose transfer function 
matches the moments of the transfer function of the original system. Generally speak- 
ing, the more moments matched, the more accurate the ROM will be. In the following, 
we first introduce the definition of the moments and moment vectors, then we show 
how to compute the matrices W and V based on moment-matching. 


3.2.1 Basic idea 


3.2.1.1 Moments and moment vectors 


If we expand the transfer function H(s) into its Taylor series about an expansion point 
So SO that SọE — A is a nonsingular matrix, 


H(s) = C"|(s — So + So)E - AJ B 
= C"[(s — so)E + (S9E - A)| B 
= C"[I + (SE - A) 'E(s — so)] (SoE - A) B 


foe) s 
= Ý C"[-(SoE - A) 'E] (SoE - A) 'B(s - 50)’, (3.8) 
i=0 =:M; (So) 
and if the system is a SISO system, then m;(Sọ), i = O, 1, 2, . . ., are called the moments of 
the transfer function H (s). If the system is a MIMO system, then m;(Sọ), i = 0,1,2,..., 
are matrices and they are called block moments [45]. In the field of circuit design, 
the entry in the jth row, kth column of m;(Sọ) is called the ith moment of the current 
that flows into port j when the voltage source at port k is the only nonzero source [45]. 
Analogies exist for other application areas, such as mechanics. In this chapter, we only 
consider m;(Sọ) as a whole, and do not consider its entries individually. This means, 
when we talk about moments of the transfer function, we mean m;(Sọ), i = 0,1,..., 
which refers either to the moments of a SISO system or to the block moments of a 
MIMO system. 
From (3.4) and (3.8), it is straightforward to obtain the corresponding Taylor series 
expansion of X(s), 


X(s) = Y [-(S0E - A) "E] '(5oE - A) 1BU(S)(s - so)". (3.9) 
i=0 


Here, we call the vectors [—(SoE LAY 1E ] '(SoE ~A)'B, i = 0,1,..., moment vectors which 
are to be used to compute the projection matrices W, V. Notice that when the system 
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in (3.1) has multiple inputs, i. e. n; > 1, then [-(s)E - A)"'E]'(s)E — A)"'B, i = 0,1,..., 
are matrices rather than vectors. For simplicity, we still call them moment vectors. 


3.2.1.2 Computation of the projection matrices W and V 


Computation of V 

Approximating X(s) by the truncated series in (3.9) means that X(s) = VZ(s). The 
columns of V € R®” constitute an orthonormal basis of S,, which is a subspace 
spanned by the moment vectors in the truncated series. After inverse Laplace trans- 
form, we obtain the corresponding approximation of x(t) in S,, i.e. x(t) = Vz(t), where 
z(t) is the inverse Laplace transform of Z(s). This means that x(t) in the time domain 
can be approximated by Vz(t). Usually, the more moment vectors included in S,, the 
more accurate the approximation Vz(t) will be. However, in order to keep the ROM 
small, we usually choose a small number of moment vectors starting from i = 0, i.e. 
the columns of the orthonormal matrix V span the subspace 


range{V} = span{B(s), A(sp)B(So), ..., AT 1(Sp)B(so)}, (3.10) 
where A(s,) = (SoE — A) 'E, B(so) = (SgE - A) ‘Bandg < n. 


Computation of W 
To obtain the ROM, we also need to compute the (Petrov-)Galerkin projection ma- 
trix W. The columns of the matrix W span the subspace below, i.e. 


range{W} = span{C(so), A-(So)C(So), . .., AZ "(S9)C(So)}, (3.11) 
where A.(So) = (SoE - A) TET, C(so) = (SpE - A) TC". Note that the two sub- 
spaces in (3.10) and (3.11) are actually two Krylov subspaces K,(A(So), B(So)) and 
K,(Ac(So), C(So)), respectively. Moment-matching methods based on computing W, V 
from Krylov subspaces are often called Krylov-based methods. If the above two ma- 
trices W and V are used to obtain the ROM (3.3), the transfer function of the ROM 
matches the first 2g moments of the transfer function of the original model [35]. We 
summarize this in the following theorem. 


Theorem 3.1. If V and W span the subspaces in (3.10) and (3.11), respectively, then the 
transfer function H(s) = C(sE + A)"'B of the ROM (3.3) matches the first 2q moments of 
the transfer function of the original system, i. e. 

m;(So) =™m;,(S9), i=0,1,...,2q-1, 
where m;(Sy) = C[-(soE - A)E]"'(soE - A)'B, i = 0,1,...,2q — 1, are the ith-order 
moments of H. 


Note that in order to ensure the projector property of VW’, one also needs to enforce 
the bi-orthogonality condition WTV = I (assuming here that the subspace basis ma- 
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trices V, W are formed over R). The moment-matching MOR method PRIMA [45] uses 
W = V. In this case, only q moments of the transfer function are matched. 

The orthonormal matrices W and V in (3.10), (3.11) can be computed by ratio- 
nal Krylov subspace algorithms (rational Lanczos algorithm, rational Arnoldi algo- 
rithm) [35]. Only (sparse) matrix factorizations, (sparse) forward/backward solves, 
and matrix-vector multiplications are used in these algorithms, such that the com- 
plexity of the moment-matching MOR is in O(nq’) for sparse matrices E, A. 


3.2.2 Stability 


In general, the moment-matching methods do not preserve stability of the original 
system. Only for systems with special structures, there exist several approaches based 
on Galerkin projection where the ROM are guaranteed to be stable and passive; see, 
e. g., [45]. For details on passivity of LTI systems, we refer to Chapter 5 of this volume. 
The passivity preservation of the moment-matching method for RLC circuits can be 
mathematically described as follows [45]. 


Theorem 3.2. If the system matrices E and A satisfy E'’+E>OandA'+A<0, respec- 
tively, and if C = B, then the ROM obtained by Galerkin projection, i.e., W = V preserves 
the passivity of the original system in (3.1). 


Stability is naturally guaranteed by passivity, therefore the ROM obtained by 
moment-matching with Galerkin projection preserves stability as well. Benefiting 
from the preservation of passivity and low computational complexity, the moment- 
matching method is very popular in circuit simulation and in micro-electro-mechanical 
systems (MEMS) simulation as well. 


3.2.3 Multiple expansion points 


The accuracy of the moment-matching methods depends not only on the number 
of moments matched, but also on the expansion points. Since the Taylor expansion 
in (3.8) is only accurate within a certain radius around the expansion point sp, the 
ROM becomes inaccurate beyond this radius. 

To increase the accuracy of a single-point expansion, one may use more than one 
expansion point. Moment-matching by multi-point expansion is also known as ra- 
tional interpolation [35]. For example, if using a set of q distinct expansion points 
{S1> . - -> Sg}, the ROM obtained by, e. g., 


range{V} = span{B(s,), ..., B(s,)}, 
range{W} = span{C(s,), ..., C(s)}, 


matches the first two moments mpo(s;),m,(s;) at each s;, i = 1,...,1 [35]. Here, B(s;) = 
(s,E — A)1B, Č(s;) = (s;E - A) TCT, i=1,...,1. 
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More generally, if we use 


range{V} = span{B(s,),...,A%~1(s,)B(s,),..., B(s), ..., A" “(s)B(sp}, (3.12) 
range{W} = span{C(s,),..., A "(s,)C€(s,),...,C(s),..., A@ (spC(s)}, (3.13) 


where A(s,) = (SpE - A)'E, A,(s;,) = (SKE - A) TET,k = 1,...,1, then we have the 
following moment-matching property. 


Theorem 3.3 ((35]). Jf 

range{V} > span{B(s,),...,A%~'(s,)B(s,),...,B(s),..., A" (s)B(sp}, 
and 

range{W} > span{C(s,),...,A%"(s,)C(s,),..., C(sp,-.., AZ (sp C(s)}, 


then the transfer function H(s) = C(sE + A) +B of the ROM (3.3) matches the first 2q, 
moments of the transfer function of the original system at each expansion point s,, i. e. 


m,;(S;) = m,;(S;,); i=0,1,...,2q,-1,,k =1,...,1 


where m;(s,) = C[-(s,£ - A)E](s,E - A)1B, i = 0,1,...,2q, — 1, are the ith-order 
moments of H at sx. 


Given expansion points s,,...,s,, Algorithm 3.1 presents a procedure for comput- 
ing the projection matrix V in (3.12). 

The matrix W can also be computed using Algorithm 3.1, only by replacing B(s;,) 
with C(s;), and A(s,) with A.(s,). Algorithm 3.1 is also applicable to SISO systems, as 
we can see from Step 7. However, for SISO systems, the algorithm can be further simpli- 
fied, and the two matrices W, V can be easily computed in parallel. Algorithm 3.2 is a 
version for SISO systems. In fact, the computed V, W € R™” from either Algorithm 3.1, 
or Algorithm 3.2, are not bi-orthogonal, which is not required by the moment-matching 
Theorem 3.3. However, for systems with E = I, the identity matrix, it is preferred that 
the reduced matrix Ê = I,, the identity matrix of dimension of r. Then we can use the 
transform W — W(V'W) to obtain a new W, so that Ê = W'EV = W'V = I. In 
the final steps of both algorithms, we need to orthogonalize the columns of the inter- 
mediate matrices using the modified Gram-Schmidt process, which is an algorithm 
for orthogonalizing any given group of vectors. The details of the algorithm are given 
in Algorithm 3.3. The finally obtained orthogonal vectors are actually orthonormal, 
i.e., their norms are all 1. The number of orthogonal vectors are 1 < l, because once 
deflation (layl < £) in Step 7 occurs, / will not be increased. 


Remark 3.1. Let size(M, 2) be the MATLAB notation for the number of columns in a 
matrix M. Then it could happen that size(V,2) # size(W, 2). In this situation, more 
computations should be done as follows. Denote rw = size(V,2), rw = size(W, 2), if 
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Algorithm 3.1: Compute V in (3.12) for a non-parametric MIMO system (3.1). 


Input: System matrices E, A, B,C, expansion points sj,..., 5). 
Output: Projection matrix V. 
1: Initialize a, = 0, a, = 0, sum = 0, col = 0. 
2: fork =1,...,l do 
3: if (multiple input) then 
4: Orthogonalize the columns in B(s;,) using the modified Gram-Schmidt pro- 
cess: [V,V,..-,Vm,] = orth{B(s;)}, 


5 sum = mx. (m, is the number of remaining columns after deflation.) 
6 else 
7: Compute the first column in V: v, = B(s;,)/||B(s;,)|l>. 
8 sum = 1. 
9 end if 
10: | Orthogonalize the columns in A(s;,)B(s,),...,A(s;,)% ‘B(s;) iteratively as fol- 
lows: 
1: fori=1,2,...,q,-1do 
12: a, = sum. 
13: if a, = a, then 
14: break; go to Step 2 
15: else 
16: forj =a,+1,...a,do 
17: w = A(s,)v;, col = sum +1. 
18: ford =1,2,...,col-1do 
19: h=viw, w= w-hyvg. 
20: end for 
21: if |wl, > e€ (€ > O is a small value indicating deflation, 
e.g., € = 10°’) then 
22: Veol = wi? sum = col. 
23: end if 
24: end for (j) 
25: end if 
26: a, =a). 


27: end for (i) 

28: Vy = [Vi -- -> Vsuml> 

29: end for (k) 

30: Orthogonalize the columns in [V,,..., V;] by the modified Gram-Schmidt process 
to obtain V, i.e. V := orth{V;,..., Vi}. 


ry < rw, then add ry - ry random orthogonal columns to V, and vice versa. This way, 
the moment-matching property of the ROM remains unchanged due to the definitions 
of V, W in Theorem 3.3. 
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Algorithm 3.2: Compute V in (3.12) and W in (3.13) for a non-parametric SISO system 
8.1). 
Input: System matrices E, A, B, C, expansion points s4,... , S}. 
Output: Projection matrices V, W. 
1: Initialize col = 0, col, = 0, col,, = 0. 
2: fork =1,...,1do 
3: Compute the first column of V;: vı = B(s;)/||B(s,)||>- 


4: Compute the first column of Wz: w; = C(s;)/|IC(s,)Il>- 

5  col=col+1. 

6: Orthogonalize the vectors A(s;,)B(s;),...,A(s,)"'B(s,) against v, it- 
eratively, orthogonalize the vectors <A,(s,)C(s;,),...,A¢(s;,)%* 1€(s,) 
against w; iteratively, as follows: 

7 fori=1,2,...,q,-1do 

8: v = Å(Sk)Vi 

9: w = A-(s,)W; 

10: forj = 1,2,..., col do 

11: h= vjv, v=v- hv;. 

12: h= ww, w=w- hwy. 

13: end for 

14: col = col +1. 

15: if ||v||, > £ (€ > O is a small value indicating deflation, e. g., € = 1077) then 

16: Veol = B 

17: else 

18: col, = col — 1, stop updating V}. 

19: end if 

20: if |w||, > £ then 

21: Weol = iwi? 

22: else 

23: col, = col - 1, stop updating W,. 

24: end if 


25: end for 

26: Vx = [Vs --->Veon, ls Wp = [Wy -- -> Wool, l- 

27: end for 

28: Orthogonalize the columns in [V,,..., V;] by the modified Gram-Schmidt process 
to obtain V, i.e. V := orth{V,,..., Vj}. 

29: Orthogonalize the columns in [W,,..., W,] by the modified Gram-Schmidt process 
to obtain W, i.e. W := orth{W,,..., Wj}. 


The issue is then how to (adaptively) choose the multiple expansion points. Many 
adaptive techniques have been proposed during the last years [8, 14, 25, 40, 39, 38, 30, 
28, 56], where some are more or less heuristic [8, 14, 25, 40]. Based on system theory, an 
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Algorithm 3.3: Modified Gram-Schmidt process. 
Input: A group of nonzero vectors a,,..., a), a deflation tolerance € > 0. 
Output: A group of orthogonalized vectors v,,..., vj, Í<l. 

1: v = a/llall» l = 1. 

2: fork = 2,...,l do 

3: fori=1,...,ldo 


4: h=v; ay, 

5: ay = ay — hvi. 
6: end for 

7: if lla, ||, > € then 
8: 1=1+ 1, 

9: i= TE 

10: endif 

1: end for 


error bound is derived in [56], but it faces high computational complexity. The resid- 
ual of the state vector is simply used in [39] as the error estimator of the ROM. In the 
next subsection, we introduce several typical techniques of adaptivity [38, 25, 30, 28]. 


3.2.4 Selection of expansion points 


3.2.4.1 H -optimal iterative rational Krylov algorithm 


The iterative rational Krylov algorithm (IRKA) is proposed in [38]. Given a group of 
initial expansion points, IRKA adaptively updates the expansion points, and upon 
convergence, IRKA produces a ROM satisfying H-optimal necessary conditions. (See 
Equation (5) in Chapter 1 of this volume for the definition of the H -norm.) The expan- 
sion points are selected as the mirror images of the poles of the updated ROM at each 
iteration. The algorithm is presented as Algorithm 3.4. 


Moment-matching property 
For single-input single-output (SISO) systems, IRKA leads to the following interpola- 
tion property upon convergence: 


A(-A;) = H(-Â;), 
dH(-A,) _ dH(-A)) (3.14) 
os os ` 


Here Ais i=1,...,r, are the eigenvalues of the ROM defined in Step 3(b) of Algorithm 3.4. 
They are also the poles of the reduced transfer function H(s). 
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Algorithm 3.4: Iterative Rational Krylov Algorithm (IRKA). 


1: Make an initial selection of the expansion points closed under conjugation, i.e. 
S1- -< Sp Si < - -> Sy, if s; is a complex variable. Fix a tolerance e for the accuracy of 
the ROM. Choose initial directions b,,...,b,, ¢,...5€;- 

2: Choose V,, W, so that 


range(V,) = span{B(s,)b,,..., B(s;)b;, B(S;)b;,4,..., B(s,)b,}, 
range(W,) = span{C(s,)¢,,..., C(s,)G, C(S G44). C(S, Jey} 


and W, = W,(V, W,) '. Here B(s;) = (s;E-A)"'B, C(s;) = (s;E-A)'C’, i =1,...,r. 


old 
s-s} 


3: While (max; {44} > €) 


Sj 
(a) Ê = W,EV,, À = W,AV,, B = W, B,C = CV,. 

(b) Compute eigenvalues, -vectors of ÀE — A so that (A;E - A)y; = Ày i = 1,...,r. 
(c) Assign s; — —À; fori = 1,...,r, Y = (yy... Yr). 

(d) B= BTY T, Č = CY, (b,...,b,) — B, (Či... č) — Č. 

(e) Update V, and W,: 


range(V,) = span{B(s,)b,,...,B(s,)b,}, 
range(W,) = span{C(s,)¢,,..., C(s,)¢,} 


and W, = W,(V W). 
4: E = W; EV,,A = W, AV,,B = W, B,C = CV,. 


It is easy to see that the images of the poles of the ROM are selected as the expansion 
points, and are updated every time the ROM is updated. In IRKA, 5; is the conjugate 
of s;. From the definition of the moments of the transfer function, we know that the 
first-order derivative of the transfer function at s; is the first-order moment m,(s;). The 
value of the transfer function at s; is the zeroth-order moment mo(s;). Therefore, IRKA 
generates ROMs matching the first two moments of the transfer function at each ex- 
pansion point s,,i=1,...,r. 


Optimality property [38] 
The ROM computed by IRKA satisfies the following necessary conditions of optimality. 


Theorem 3.4. Let H(s) be the transfer function of a stable SISO system, and H be a local 
minimizer of dimension r for the optimal H -model reduction problem 


|H-Hlly,=  _ min IH - All, 
dim(H)=r,H:stable 
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and suppose that H(s) has simple poles at A,, i=1,...,r, then H(s) interpolates H(s) and 
its first derivative at A;,i =1,...,r: 


Comparing Theorem 3.4 with the moment-matching property in (3.14), we see that 
IRKA constructs a ROM that satisfies the necessary condition of the local optimal prop- 
erty in Theorem 3.4. 


3.2.4.2 A heuristic technique 


In [25], an adaptive scheme for both choosing expansion points and deciding the num- 
ber of moments is delineated. Generally speaking, the expansion points are chosen 
based on a binary principle. The number of moments matched at each expansion point 
is determined by a tested point which is known to cause the largest error in the inter- 
val of each pair of neighboring expansion points. Using this technique, the projection 
matrix V in (3.12) is adaptively computed, and the ROM is obtained by Galerkin projec- 
tion using W = V. The only inputs of the algorithm are an acceptable dimension of the 
ROM, say rřmax, as Well as the acceptable accuracy of the ROM, tol. ry, will be adjusted 
to a proper number during the adaptive scheme if it was selected too small. The details 
of the algorithm can be found in [25]. From the numerical examples in [25], the method 
shows its success in automatically obtaining ROMs for several circuit examples. It is 
nevertheless clarified in [25] that the proposed method has difficulty in dealing with 
multi-input and multi-output (MIMO) models with many resonances in the output re- 
sponses. The proposed method may obtain good results for a single-input and single- 
output (SISO) system with many resonances in the output, but it will fail when the 
system is MIMO and possesses multiple resonances in all the output responses of all 
the I-O ports. For such systems, an efficient error estimation may help to construct 
more robust and reliable ROMs. In the next subsection, we introduce a greedy-type 
algorithm which adaptively selects the expansion points using a recently developed a 
posteriori error bound. 


3.2.4.3 Scheme based on a posteriori error estimation 


In [28], an a posteriori error bound A(ji) for the transfer function of the ROM is pro- 
posed. This will be discussed in Section 3.3.4, where the error bound is defined for 
linear parametric systems, and can straightforwardly treat linear non-parametric sys- 
tems as a special case. For linear non-parametric systems, the error bound A(ji) actu- 
ally depends only on s, i.e. ñ = s. A(s) can be computed following (3.25) and (3.26), 
except that ji is replaced by s. 
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Similar to the reduced basis method (see Chapter 4 of Volume 2), the next expan- 
sion point s; is iteratively selected as the point at which the error bound is maximized. 
Using the error bound, the ROM can be automatically generated by Algorithm 3.5. The 
projection matrices W and V for constructing the ROM are extended iteratively by the 
matrices W; and V; generated at the selected expansion point §, until the error bound 
is below the error tolerance €,,). The so-called training set &,q;, is a set of samples of 
s, which is given by the user, and which should cover the interesting range of the fre- 
quency axis. The expansion points are selected from Eyan. The matrices W™ and V% 
are used to compute the error bound; see (3.25). 


Algorithm 3.5: Automatic generation of a reduced model by adaptively selecting ex- 
pansion points § for non-parametrized LTI systems. 


Input: System matrices E, A, B, C, €to] > O, Etrain: a large set of samples of s, taken over 
the interesting range of the frequency. 
Output: The projection matrices W, V. 
1: W= [], V = [], sete = € +1. 
2: Initial expansion point: § € Eyain. 
3: while € > € do 
4: range(V;) = span{B(8), A(8)B(S),..., ATS) BO}, 
5: range(W,) = span{C(8), A.(8)C(8),..., ATS) CS}. 
6: V=orth{V, V3}, W¥ = V. 
7t W= orth{W, W;}, V¥ = W. 
8: $= arg MaXsez,„„„ A(S). 
9: e=A(s). 
10: end while 


Either V; in Step 6 or W; in Step 7 in Algorithm 3.5 can be computed by Step 1, Steps 
3-29 plus Step 31 in Algorithm 3.1. Step 6 or Step 7 of Algorithm 3.5 implements the 
modified Gram-Schmidt process, Algorithm 3.3. 


3.2.4.4 Complex expansion points 


Note that the projection matrices V, W computed by the moment-matching method, 
as well as the multi-moment-matching method in the next section, could be complex, 
if the expansion point for the variable s is taken as a complex number. The ROM then 
has complex system matrices, even if the original system matrices are real. 

In order to obtain real reduced system matrices, each complex matrix should be 
separated into its real part and its imaginary part, which should then be combined 
to obtain a real projection matrix for MOR, i.e. we need to do the following extra 
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step: 


V <— orth{Re(V), Im(V)}. 
W e orth{Re(W), Im(W)}. 


Here and below, Re(-) and Im(-) is the real and imaginary part of a complex variable, 
respectively. Since, in C”, 


range{V} = colspan{Re(V),Im(V)}, range{W} = colspan{Re(W), Im(W)}, 


over C, the moment-matching property in Theorem 3.3 remains unchanged. 

The algorithm IRKA in Section 3.2.4.1 can also introduce complex interpolation 
points s; and 5;, where 5; is the conjugate of s;. If s; is complex, then in Step 2 of IRKA, 
(s,E — A) BB, and (5;E — A)BB,,, are both complex vectors, which may produce complex 
matrices V,, W,. Itis nevertheless not difficult to verify that the conjugate of (s;E—A)BB; 
is (5,E — A)BB;,,, so that they have the same real and imaginary (up to the sign) parts. 
Therefore, in Step 2, we can replace (s;E—A)BB; and (5;E—A)BB;,, with Re[(s;E—A)BB;] 
and Im|[(s,E — A)BB;] for any complex s;, without changing the subspace. 


3.3 Multi-moment-matching for linear parametric 
systems 


Some parametric model order reduction (PMOR) methods are basically extensions of 
MOR methods for non-parametric systems. PMOR methods can be used to compute 
the ROM of the parametric system in (3.1), where the vector of parameters u should be 
symbolically preserved in the ROM as follows: 


= A(u)z(t,u) + Bult), 


5 dz(t, 
By EP 


W(t.) = C(wz. 


Here E(u) = WTE(u)V, AQ) = WTAH)V, B(u) = W? BQ and Ĉu) = C(u)V. A survey 
of PMOR methods can be found in [13]. 


3.3.1 Arobust algorithm 


Multi-moment matching PMOR methods are reported in [17, 24], which are generaliza- 
tions of the moment-matching method [35]. In this section, the robust PMOR algorithm 
proposed in [24] is reviewed. For ease of notation, we call this method PMOR-MM. Both 
methods in [17, 24] are based on Galerkin projection, i.e. W = V. Note that the method 
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in [24] is already extended to Petrov—Galerkin in [2]. For clarity and simplicity, we use 
Galerkin projection to explain the idea and the algorithm. Assume that E(u), A(y) are 
either in the affine form defined as 


E(u) = Eo + Ery; + +++ + Emm 
A(H) = Ao + Ahı +-+: + Amb» 


or can be approximated in the affine form above. 

To compute the matrix V, a series expansion of the state x in the frequency domain 
is used. After Laplace transform (with zero initial condition), the original parametric 
system in (3.1) can be written as 


GXH) = BAU A), 


(3.15) 
Yi) = CHX), 


where the entries in ft = (ñ... , ñp) are sufficiently smooth functions of the parameters 
Hi- --» Hm and the Laplace variable s. U (ñ) is the Laplace transform of u(t). Due to the 
affine form of E(w) and A(u), G(ñ) can also be written in affine form as 

G) = Go + Gif + +++, Gop- 


As a result, the state vector in the frequency domain can be written as 


XG) = [C] BU 
= [Go + Gif +--+ + Gpp] BEU (fi). (3.16) 


Given an expansion point ji° = [n, ad Bp], X(f) in (3.16) can be expanded as 


XH) = [I - (0,M; +---+.0,M,)) ByU@ 


Y (0,M; + +++ + o Mp) BuU Gd, (3.17) 
i=0 


where By = [GO] BG), M; = -IGG G, i = 1,2,...,p, and o; = ĵi - j,i = 
1,2,...,p. We call the coefficients in the above series expansion the moment vectors 
(matrices) of the parametrized system. The corresponding multi-moments of the trans- 
fer function are those moment vectors multiplied by C from the left. 

To obtain the projection matrix V, instead of directly computing the moment vec- 
tors [17], a numerically robust method is proposed in [29], and a detailed algorithm is 
presented in [24]. The method combines the recursions in (3.18), with a repeated mod- 
ified Gram-Schmidt process so that the moment vectors are computed implicitly. We 
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have 
Ro = By, 
R; = [MjRo,..-,MpRol, 
R, = [M,R,,...,MpRi], 
(3.18) 


R4 = (MR, y,---sM,R4) 


Here, By = Biz. if B(ñ) does not depend on p, i.e. B(ñ) = B. Otherwise, By = 
[Bm -> Bm, l» By, = [G(ju°)|'B;, i = 1,...,p, if B(j) can be approximated in affine 
form, e. g., B) ~ By +--+ + Byfy. 

Then V := Vio is computed as 


range(V,0) = span{Ro, R;,...,Rg}go> (3.19) 


with the sub-index denoting the dependance on the expansion point p. Itis proved 
in [24] that the leading multi-moments of the original system match those of the ROM. 
The accuracy of the ROM can be improved by increasing the number of terms in (3.19), 
whereby more multi-moments can be matched. To be self-contained, we present the 
method in Algorithm 3.6. 

It is noticed that the dimensions of R;,j = 1,...,q, increase exponentially. If the 
number p of the parameters is larger than 2, it is advantageous to use multiple point 
expansion, such that only the low order moment matrices, e. g., R;, j < 1, have to be 
computed for each expansion point. As a result, the order of the ROM can be kept 
small. Given a group of expansion points i, i=0,...,¢,a matrix Vy can be computed 
from (3.19) for each i as 


range(V,;) = span{Ro, R,,...,Rg}yi> (3.20) 


where Ri =1, -q are defined as in (3.18), with Rọ = By, By = (G(jr')| "1B, or By = 
[Bm -> Bu, l; By, = [GG] "B;, M; = -[GÒ] G; j = 1,2,...,p. The final projection 
matrix V is a combination (orthogonalization) of all the matrices Viis 


V= orth{V o, 6 Vg}. (3.21) 


The multi-moment-matching PMOR method is very efficient for linear parametric sys- 
tems, especially for systems with affine matrices E(w), A(u) [22]. The method also per- 
forms well if the matrices are not affine, but it can be well approximated in affine 
form [13, 30]. 
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Algorithm 3.6: Compute V = [v,,v),...,v,] for a parametric system (3.1), where B(y) 
is generally considered as a matrix. 


Input: Expansion point pe, moment vectors in R4, R,...,Rg. 
Output: Projection matrix V. 

1: Initialize a, = 0, a, = 0, sum = 0. 

2: if (multiple input) then 

3: Orthogonalize the columns in Rọ using the modified Gram-Schmidt 

process: [V;,V2,...,Vq,] = orth{Ro}, 

4: sum = q. (q; is the number of remaining columns after orthogonalization.) 

5: else 
6: Compute the first column in V: v, = Ro/||Roll2, sum = 1. 
7: end if 
8 
9 


: Orthogonalize the columns in R4, R),...,R, iteratively as follows: 
: fori=1,2,...,qdo 

10: a, = sum; 

1: ford =1,2,...,pdo 


12: if a, = a, then 

13: stop 

14: else 

15: for j = a,+1,...a,do 

16: w =G "(i°)Gqv;, col = sum + 1. 
17: for k = 1,2,..., col — 1 do 

18: h=viw, w =w - hv. 

19: end for 

20: if |wll > € (a small value indicating deflation, e. g., € = 1077) then 
21: Veol = wh? sum = col. 

22: end if 

23: end for (j) 

24: end if 


25: end for (d) 

26: a, = 4); 

27: end for (i) 

28: Orthogonalize the columns in V by the modified Gram-Schmidt process. 


3.3.2 Applicability to steady systems 


The above PMOR method computes ROMs of the dynamical systems in (3.1). It is easy 
to see that the method can be naturally applied to steady systems: 


(Eo + E14; + ++: + Enbmn)X() = BQWU(H), 


(3.22) 
yu) = C(u)x(y). 


3 Model order reduction based on moment-matching —— 75 


Comparing (3.22) with the Laplace transformed system (3.15), we see that they have 
an identical form. Consequently, the series expansion of x in (3.22) can be obtained 
similarly to (3.17). The corresponding moment vectors can also be defined according 
to (3.18). Algorithm 3.6 can then be used to compute a projection matrix V. Then the 
ROM of (3.22) is constructed by a Galerkin projection, 


(VEV + VE Vyn +-+- + V Em Vua xH) = V BUH), 
HW) = CV XM). 
3.3.3 Structure-preserving (P)MOR for second-order systems 
For the second-order systems 


a y) a y) 


Mp) + Du) — A +Kx(t, u) = Bult), 


y(t, u) = C(u)x(t, y), 


often arising from, e. g., mechanical engineering, it is desired that the ROM preserves 
the second-order structure, i.e., 


a? A y) 


(3.23) 


Z, y) 


+ WK) Vz(t, u) = W"B(y)u(t), 
Yt, u) = C)VZz(t, y). 


Note that PMOR-MM computes the projection matrix using the series expansion of 
the state vector x in the frequency domain. After a Laplace transformation (assume 
x(t = 0, u) = 0), the first equation in (3.23) becomes 


S’°M(W)X(s, 4) + sSDH)X(s, u) + KM)X(S, u) = B(u)U(s), 
Y(s, u) = C(u)X(s, p). 


W'MDV + WD v E 


(3.24) 


Thus 
[S° M(H) + sD(u) + K(u)]X(s,u) = B)U(s), 


where U(s) is the Laplace transform of the input signal u(t). Defining G(f) := s°M (uw) + 
sD(p) + Kiu), A = (fa fos Bg) := (s$? s, u), a projection matrix V can be computed 
following (3.15)-(3.21) in Section 3.3.1. Applying a Petrov-Galerkin projection to the 
second-order system, we can obtain a second-order ROM as in (3.24). Note that with a 
Galerkin projection, also the symmetry and definiteness properties of the coefficient 
matrices can be preserved in the ROM. 


3.3.4 Selecting expansion points based on a posteriori error 
estimation 


Note that the projection matrix V in (3.21) depends on the multiple expansion points 
ie ba eee 
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In this section, we introduce an a posteriori error bound proposed in [28], which is 
an error bound for the transfer function Ñ (ñ) of the ROM. Given the error bound A(ji) 
for the ROM, the expansion points i can be adaptively selected, and the projection 
matrix V can be automatically computed as shown in Algorithm 3.7. 


Algorithm 3.7: Adaptively selecting expansion points i, and automatically comput- 
ing V. 


Input: €,,), set € = £to] +1, Stain: a set of samples of ji covering the interesting domain. 
Output: V. 
t V=(],V™=[]. 
2: Choose an initial expansion point: ge cE, i=0. 
3: while £ > £o; do 
4: range(Vy) = span{Ro, R,...,Rg},i- (By applying Algorithm 3.6 at expansion 
point ji.) 
5: range(V 6") = span{R¥, RY, Die RO (By applying Algorithm 3.6 at expansion 
point i, and replacing Ro, R4». . ., Rg with RY, RM, or Re in (3.27).) 


6: V=orth{V, Vials W=V. 

7z VČ = orth{v™, va}, w¥ = y%™, 
8: i=it+l. 

9: jt’ =arg MaXjcz,,,,, AA). 
10: €= A(ğ’). 


11: end while. 


For a MIMO system, the error bound A(ji) is defined as 
AQ) = nie Ay). 


Here Aj (jt) is the error bound for the (i, j)th entry of the transfer function (it is a matrix 
for MIMO systems) of the ROM, i. e., 


IHi - Hy @| < AA. 
For a SISO system, there is no need to take the maximum. Ay) can be computed as 


EOI ays 
(jt) = - +R E I. 3.25 
iH) BQ (x) "| (3.25) 
Here and below, (-)* is the conjugate transpose of a vector or a matrix. Let c; be the ith 
row of C(ñ) and b; be the jth column of B(ñ) in (3.15), 


1" (i) = bj - GG", 
apr os -1 
x” = V[W'G@V] W'b,, (3.26) 
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ri" (i) = -ci - G* (a, 
glu = -vč (w Ggv” w) eT 


1 1? 


where x ” is the approximate solution to the primal system 
my — 
G(X?" = by, 


and can be computed from the ROM of the primal system obtained with the projection 


matrices W, V. x= is the approximate solution to the dual system 


G* x] = -c}, 


and can be computed from the ROM of the dual system obtained with the projection 
matrices W™, V™. The variable B(M) is the smallest singular value of the matrix G(ji). 
The matrix V™ can be computed, for example, using (3.20) and (3.21), by replacing 
Ro» -- -> Rg, with RORY oes RH, where the matrices G(ji') in Ro,...., R,, are substituted 
by G* (i), and B is replaced with -C7. More specifically, 


net = cf 
RÈ MRS ..., MG RSs 
RŠ" = MRE ..., MERE, 
(3.27) 
d du pd du pdi 
R! = MORE ig MIRAL], 


o” 


RE = CH = -[G* (DICT, M; = [G* GÒ] G7, j = 1,2,..., p. We can take W™ = V™, 
The derivation of A(ja) is detailed in [28]. 

It is worth pointing out that, although the error bound is dependent on the pa- 
rameter fi, many jl-independent terms constituting the error bound need to be pre- 
computed only once, and they are repeatedly used in Algorithm 3.7 for the many sam- 
ples of j in Eyain- For example, when computing x;” in r)”, W'G,V,k =1,...,m, are 
y-independent, and need to be computed only once, W'G(j)V is then derived by as- 
sembling W'G,V for any value of jl. 

Algorithm 3.7 is similar to Algorithm 3.5, except that the projection matrix V is 
constructed for system (3.1) in the parametric case using Algorithm 3.6. At the ith iter- 
ation step, the expansion point i is selected as the one maximizing the error bound 
A(ji). The projection matrix V is enriched by the matrix Vii corresponding to ği. The 
matrix V™ aidsin computing the error bound. The most costly part of the error bound 
is (ñ), since we need to compute the smallest singular value of the matrix G(ji) of 
full dimension. The smallest singular value of the projected matrix W? G(ji)V could be 
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heuristically used to approximate A(j1). In [44], a method of interpolation is proposed 
to compute an approximation of £ (ñ), which has been shown to be accurate and cheap. 

At the end of this section, we mention a method from [9], which deals with lin- 
ear parametric systems with time-varying parameters. There, it is shown that these 
parametric systems can be equivalently considered as bilinear systems. Then suitable 
MOR methods for bilinear systems can be applied. MOR for bilinear systems will be 
discussed in the next section, where MOR based on multi-moment matching or bilin- 
ear IRKA (BIRKA) [9, 10, 12] are introduced. 


3.4 Moment-matching MOR for nonlinear systems 


In this section we consider mildly nonlinear systems without parameter variations, 


dx(t) 
a = f(x(t)) eb Bu(t), (3.28) 
y(t) = Cx(t), 


where x(t) € R” and f(-) € R” is a nonlinear, vector-valued function depending on 
x(t). These nonlinear systems usually come from spatial discretizations of nonlinear 
PDEs. The ROM via Petrov—Galerkin projection is obtained as follows: 


wi py ZP = 
dt 


W'f(Vz(t)) + W’ Bu(t), 


where W € R™’ and V € R™’ with WTV =I. 

In the literature, some MOR methods for nonlinear systems are based on moment- 
matching. The quadratic method [16] is the simplest one. The bilinearization method 
[5, 46] is more accurate than the quadratic method. Methods based on variational 
analysis [11, 27, 37, 47, 52], in general, yield smaller errors than the previous two. A 
method based on a piece-wise linear approximation of the nonlinear function f(-) [50] 
could be used when dealing with strong nonlinearities. These methods are extensions 
of the moment-matching methods for linear systems. 


3.4.1 Quadratic method 


We first analyze the quadratic MOR method proposed in [16]. This method approxi- 
mates the nonlinear function f(-) by its power series expansion at, e.g., x(0) = 0, 
which can be rewritten into a Kronecker product formulation of x(t) [52], 


f(x(t)) = £(0) + A,x(t) + A>(x(t) @ x(t)) 


(3.29) 
+A (XO @ x(t) @x(t)) ++, 
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where A, € R™” is the Jacobian of f and, in general, A; € R™” denotes a matrix whose 
entries correspond to the jth-order partial derivatives of f; W. r. t. x),...,X%,,1<i<n. 
Here f;, x, are the ith and kth entry of f and x(t), respectively. A quadratic system can 
be obtained by a truncation of (3.29), 
ae = A,x(t) + A>(x(t) @ x(t)) + Bu(t) + £(0), (330) 
y(t) = Cx(t). 


If {(0) = O, the projection matrix V is computed as an orthonormal basis of the Krylov 
subspace K,(A;",A;'B) as follows: 


range(V) = span{A;'B, A;*B,...,A,“B}. 


Note that V is constructed only by use of the linear part of the quadratic system. A ROM 
is derived as 
dz(t) 


VV = V'A,Vz(t) + V'A,(Vz(t) @ Va(t)) + V” Bu(t), 


y(t) = CVz(t). 


It can be seen that the idea of the quadratic method comes from the moment-matching 
method for linear systems. The projection matrix V is computed in the same way as in 
the previous moment-matching methods, but is applied to the quadratic system. 

If £(0) + O, then the system in (3.30) can be reformulated into 


poe = A,x(t) + A>(x(t) @ x(f)) + [B, £(0)][u(t), 1] T 
y(t) = Cx(t). 


The input matrix B in (3.30) is replaced by the matrix [B, f(0)], which means f(0) can 
always be treated as a part of the input matrix of the system, therefore, for simplicity, 
we assume below that f(0) = O. 


3.4.2 Bilinearization method 


For a nonlinear system with E = I, and with single input, i.e. B is a vector b, a bi- 
linear system can be obtained by applying the Carleman linearization process to the 
nonlinear system (3.28) [52]. In [5, 46], the bilinear system is derived by approximating 
f(x(t)) with a degree-2 polynomial in the Carleman linearization process. More specif- 
ically, by use of the first three terms in (3.29), we obtain the following approximation 
of f(x(t)): 


f(x(t)) = A,x(t) + A>(x(f) 8 x(t)). 
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With the definitions 
Xg = > = > = (C, U), 
2 x(t) @x(t) e AVG z 
Ag = ( Ai A; ), 
Oo A, @I+I@A, 


(0) (0) 
Nex ( ) 
bel+Ieab O 


the nonlinear system (3.28) (E = I) can be approximated by the following bilinear 
system: 


dx, 
Fe = AoXe +NoXoUlt) + Boult), (3.31) 
y(t) = CoXo: 


The derivation can be easily extended to multi-input systems with B € R™” being 
a matrix. After a few more calculations, the following bilinear system with multiple 
inputs can be obtained: 


aXe ny à) 

——=A N (t) + B,u(t), 

dt eXe t 2  Xeli;(t) + Boult) (3.32) 
y(t) = CoXg; 


where u(t) := (U(t),...,Un, (t))", B := (by,...,by,) and 


B i 0 0 
Bo = , NO = 
0 b; 81 +I @b; (0) 


Given a bilinear system with E singular, several modified MOR schemes are pro- 
posed in [1, 12, 33, 34], but will not be discussed in this chapter due to space limita- 
tions. We can see that the above bilinear system is of much larger state-space dimen- 
sion than the original nonlinear system (3.28). In the following we will introduce the 
process of constructing the projection matrix V for MOR. 

Once the nonlinear system is approximated by the bilinear system (3.31) or (3.32), 
there are several choices for applying MOR. Multi-moment-matching methods extend 
the moment-matching methods for linear systems to bilinear systems by studying the 
transfer function of the bilinear system. Gramian-based bilinear MOR methods con- 
struct the matrices W and V by exploring the Gramian matrices of the bilinear systems. 
We focus on multi-moment-matching methods in this chapter. 
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3.4.2.1 Constructing W and V 


Multivariate transfer functions and multi-moment-matching 
The bilinearization MOR methods in [5, 46, 23] construct the matrices W, V, W = V by 
approximating the transfer function of the bilinear system. Note that only SISO sys- 
tems are considered in [5, 46]. For MIMO systems, the expression of the transfer func- 
tion will be different; see [43]. In [43], a method similar to that in [46] was extended to 
MIMO systems. In the following description, we consider only SISO bilinear systems. 
Under the assumption E = I, the identity matrix, the output response of the bilin- 
ear system (3.31) can be expressed by a Volterra series [52], 


y(t) = È yO, 
k=l 
where y;(t) is described in (3.33) and (3.34). In (3.34), ia is called the regular kernel of 


degree k. The multivariate Laplace transform of this kernel defines the kth multivariate 
transfer function H oS in (3.35). We have 


th tk 
y(t) = | BLA —t,--++-t)-+-u(t - ty) dtp .. dti, 
00 0 
(3.33) 
He? Geist) = Cee Nee Poe Nae eB, (3.34) 
HE?) (s,,...,S,) = Co(Syl — Ag) 'No(Sp-11 - Ao)” -No (Sil - Ao) Bo. (3.35) 


By using the Neumann expansion, 
(SAF) = I + sA3! + AF t SAZ poe, 


Hee (Si S2...» Sk) can be expanded into a multivariable Maclaurin series in (3.36), 
with the so-called kth-order multi-moments m(4,..., ly) being defined in (3.37). We 
have 


foe) foe) 

Ge .sas => oY Monee 636) 
k=1  h=1 

m(h,...5l) = (-1)" CA3“ No y ‘Ag’ NeAg Be» Lysol = 1,2... (3.37) 


Deriving W and V 
In [23, 7], the BICOMB method constructs the projection matrix V from a series of 
Krylov subspaces in the following steps: 


range(V") = K, (43', Ag Ba): (3.38) 
and forj >1 


range(V) = K, (43'A NV"). (3.39) 
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The final projection matrix V is 
range(V) = orth{V™, woe ada’ (3.40) 
Here, 
K4, (A,R) := {R, A'R, ..., AUR} 


is a block Krylov subspace generated by R = A,'Bg, j = 1, or R = A,'N,V", j > 
1. Note that each V® actually tries to match the multi-moments of the kth transfer 
function H ee the multi-moment-matching property of the ROM can be found in, 
e.g., [23]. Applying x, = VZ, to (3.31), the ROM of the nonlinear system (3.28) is given 
by 


dz, z 7 B 
ae = Age + NeZeu(t) at Bgu(t), (3.41) 
y(t) = CZ; 


where Ay = V'A V, Ño = V'NeV, By = V'Bg, Cg = CoV. For simulation results of the 
BICOMB method, we refer to [7]. Algorithm 3.1 can be mildly modified to Algorithm 3.8 
to compute V9 in (3.38)-(.40). Algorithm 3.8 computes a matrix V from the block 
Krylov subspaces defined as follows: 


range(V") = K, (M™,R;), j=1,...5J. (3.42) 


The final projection matrix V is computed as in (3.40). Let M = Ag, R, = A, Be and 
R; = Az No vo for j > 1, Algorithm 3.8 can be used to compute the matrix V in (3.40) 
for the ROM (3.41) of the bilinear system. 


3.4.3 Variational analysis method 


The third set of nonlinear MOR methods [11, 27, 37, 42, 47] originates from variational 
analysis of nonlinear system theory [52]. 


3.4.3.1 Methods using polynomial approximation 


In [27, 42, 47], the original nonlinear system is first approximated by a polynomial sys- 
tem, then variational analysis is applied to the polynomial system to obtain a reduced 
polynomial system. In the following, we describe the method developed in [27]. Its 
main difference from the method in [47] is the construction of the projection matrices 
V, and V3, and will be explained later. 
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Algorithm 3.8: Compute V in (3.40). 


Input: Matrices of the block Krylov subspace in (3.42): M, R; j =1,2,...,J. 
Output: Matrix V in (3.40) with orthonormal columns. 

1: Initialize a, = 0,a, = 0, sum = 0. 

2: forj=1,...,J do 


if R; is a matrix then 
Orthogonalize the columns in Rj using a modified Gram-Schmidt 
process: [Vi V2 -> Vm] = orth{R;}. 
sum = mj. (m; is the number of remaining columns after deflation.) 
else 
Compute the first column in V;: v, = R;IIR;Ill2- (R; is a vector.) 
sum =1. 


end if 
fori =1,2,...,q;-1do 
a> = sum. 


if a, = a, then 
break, go to 2. 
else 
ford =a,+1,...,a,do 
w = Mvg, col = sum + 1. 
for k = 1,2,...,col-1do 
h=viw, w= w — hvg. 
end for 
if ||w||, > £ (a small value indicating deflation, e. g., € = 107”) then 
Veol = wi? sum = col. 
end if 
end for (d) 
end if 
a, =). 
end for (i) 
V; = [vi e >Vsuml> 


28: end for (j) 
29: Orthogonalize the columns in [V;,..., Vy] by the modified Gram-Schmidt process 


to obtain V, i.e. V := orth{Vj,..., Vy}. 


With the power series expansion of f(x(t)) in (3.29), the original nonlinear system 
(3.28) is first approximated by a degree-2 polynomial system 


dx(t) 
dt 
y(t) = Cx(t), 


= A,x(t) + A (x(t) 8 x(f)) + Bu(t), Gis 
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or by a degree-3 polynomial system 


dx(t) 


P A x(t) + A (x(t) 9 x(t)) 
+A3(x(t) 8 x(t) © x(t)) + Bu(t), (3.44) 
y(t) = Cx(t). 


Consider the response of (3.28) to a variation of the input au(t), 


dx(t) _ 
ae f(x(t)) + B(au(t)), 


y(t) = Cx(t), 


(3.45) 


where a is an arbitrarily small-valued variable. Assuming that the response to u(t) = O 
is x(t) = O (in [52], it is called a forced response), then x(t), as a function of a, can be 
expanded into a power series of a around a = 0, 


x(t) = ax,(t) + a’x,(t) + a’x,(t) fees, (3.46) 


where the first term of the series is x)(t) = x(t,@) = 0) = O, since when a = 0, 
apu(t) = 0. The corresponding response x(t,@ = 0) = O is then removed from the 
above expansion. Substituting both (3.46) and (3.29) into the right hand side and (3.46) 
into the left hand side of (3.45), we get 


dx,(t) ,dx,(t) —_3dx,(t) 
a-g + u ee 


= aA,X,(t) 


+ [A x(t) + Ay(x,(t) @x,(t))] +--+ B(au(t)). 


Since this equation holds for all a, coefficients of powers of a can be equated. This 
gives the variational equations: 


dx,(t) 


dt — A,X,(t) T Bu(t), (3.47) 
tot = A,x,(t) + A(X (t) 8 x,(t)), (3.48) 
a0 = A,X3(t) + A(X (t) 8 X(t) + X(t) 8 x(t) 


+ A; (x(t) 8 x(t) ®x,(t)), (3.49) 


It is worth pointing out that the assumptions on the forced response can be relaxed, 
and similar variational equations of xs = x(t) — x(t) can be derived. Here, X(t) is the 
response to a certain input u(t) for a fixed initial state x(t = 0) = x5. For a detailed 
discussion; see Section 3.4 in [52]. 
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Deriving W and V 
We notice that all of these variational equations are linear systems of order n for the 
vectors of the unknowns x; (t), x,(t), ..., respectively. Since x(t) is a linear combination 
of x,(f), X(t), ... (see (3.46)), it is in the subspace spanned by x,(t), X(t), .... The 
projection matrix V can be computed from the subspace containing x,(t), x(t), .... 
Building upon this observation, the method in [27] constructs V based on the lin- 
ear variational equations (3.47)-(3.49) rather than from the nonlinear system. From 
the moment-matching MOR for linear systems, a projection matrix V; for x,(t) of the 
first linear system (3.47) is constructed as 


range(V,) = span{A;'B, A;*B,...,A,"B}. (3.50) 


Then x,(t) can be approximated by x,(t) ~ V,z,(t). A projection matrix V, for x,(t) of 
the second linear system (3.48) is similarly constructed by 


range(V) = span{A;'A),A;7A>,...,4,?Ay}, (3.51) 


such that x,(t) ~ V>z,(t). A projection matrix V; for x3(t) in (3.49) can be derived in a 
similar way. From (3.46), we have 


x(t) = aV,z,(t) + a’V5z,(t), 
or 
x(t) = aV,z,(t) + a°V2,(t) + a°V3z3(¢), 


which indicates that the solution x(t) to (3.43) or (3.44) can be approximated by a linear 
combination of the columns of V,, V, or V,, V>, V3. Therefore the final projection matrix 
V can be computed as 


range(V) = orth{V,,..., Vj}, J = 2 or 3. (3.52) 


The ROM is thus derived from the polynomial system (3.43) or (3.44) as follows: 


dz(t 
7 = ViA,Va(t) + V'A,(V2(t) 8 Va(t)) + V" Bul), (3.53) 


y(t) = CVz(t), 
or 


dz(t) 


r~ V'A,Va(t) + V'A,(Vz(t) @ Vz(t)) 


+ V™A3(V2(t) @ Va(t) @ Va(t)) + V"Bu(t), (3.54) 
y(t) = CV2(t). 
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The advantage of this method is that it has the flexibility of using a more accurate 
polynomial system (3.44) to approximate the original nonlinear system. It is possible 
that for the quadratic method, the system (3.30) can also be replaced by a more accu- 
rate polynomial system. However, the projection matrix V computed by the quadratic 
method might be less accurate than the matrix V in (3.52), because it is computed using 
only the linear part of the nonlinear system. For an approximation of the original non- 
linear system (3.28), the bilinear system is less accurate than the polynomial system 
(3.44). Moreover, since the bilinear system is derived by approximating the nonlinear 
function f(x) by its power series expansion up to second order, the projection matrix 
V also only uses the information of the series expansion of f(x) at most to the second 
order, which is less accurate than the matrix V computed by the variational analysis 
method. 

Again, let M = Aj, R; = A;'B, R, = Aj'A, in (3.50), (3.51), then Algorithm 3.8 can 
be used to compute V in (3.52). Since there are many columns in A,, it is not possible 
to use all the columns. Instead, one may take only the first several columns, e. g., Ry = 
Aj'Ad(:, 1: q),q x n, where MATLAB notation is used. 

In [47], the second projection matrix V, is constructed from the approximate sys- 
tem by replacing x, with V,z, in (3.48) to get 


dx,(t 
ol ) AOA MAOS A: 
Then 
range(V) = span{A;'A,(V, 8 V1), A7 A (V1 8 Vj),... A, ”A(V1 ® V1)}. (3.55) 


The advantage of this approach is that there are much fewer columns in A,(V, @ V1) 
than in A, in (3.51). Thus, V; matches more moments than V, in (3.51) if the matrices 
have the same number of columns. However, V, only matches approximate moments 
because the input matrix A, in (3.48) is approximated by A,(V, @ V,) in (3.55). There- 
fore, although V, matches more moments, its accuracy is impaired by the approximate 
moments. The accuracy of the two methods is compared in [7]. 

At the end of this subsection, we would like to mention another method [42] 
which is based on both the Volterra series expansion of the output response and 
variational analysis. In [42], the original system (3.28) is approximated by the polyno- 
mial system in (3.44). Then the Volterra series representation of the output response 
of the polynomial system is employed to introduce the nonlinear transfer functions 
of (3.44). The kth-order nonlinear transfer function is similar to the kth transfer func- 
tion H, ee (S1 Sz- . -» Sk) for the bilinear system. The projection matrix V is constructed 
based on the moments of the nonlinear transfer functions. Instead of performing the 
Laplace transform of the Volterra kernels as in (3.35), the nonlinear transfer func- 
tions are computed from the variational linear systems (3.47)—-(3.49), whose transfer 
functions are equivalent with the first-order, second-order and third-order nonlinear 
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transfer functions, respectively. The basic idea of [42] is quite similar to the methods 
in [5] and [46]. The main difference is that [5] and [46] are based on a bilinear approx- 
imation of the original nonlinear system, whereas [42] is based on the more accurate 
approximation (3.44). 


3.4.3.2 Methods based on quadratic—bilinearization 


All those previous nonlinear model reduction methods first approximate the nonlin- 
ear function f(x(t)) by a polynomial, then reduce the approximate polynomial system 
to a small dimension. When the function f(x(t)) is weakly nonlinear, it is usually suff- 
cient to approximate it by a degree-2 polynomial or degree-3 polynomial. Meanwhile, 
when f(x(t)) is strongly nonlinear, the low-degree polynomial approximation is not 
accurate. It is possible to employ higher order polynomials to improve the accuracy, 
but with much more complexity. Furthermore, the storage requirement for higher or- 
der polynomials is prohibitive if the matrix dimension is very large. Therefore, these 
methods are more suitable for weakly nonlinear systems. 

The methods based on quadratic—bilinearization provide a solution to the above 
issues of polynomial approximation. Instead of approximating the nonlinear part f(x) 
by a polynomial function, equivalent transformations are applied to the nonlinear sys- 
tem in (3.28). The nonlinear system is first “lifted” to a polynomial system by adding 
polynomial algebraic equations or by taking Lie derivatives and adding more differen- 
tial equations. The polynomial system is then transformed into a quadratic—bilinear 
system by either adding quadratic algebraic equations or taking Lie derivatives again. 
No accuracy is lost during the transformations. The detailed explanation can be found 
in [36, 37]. 

The equivalent quadratic—bilinear system is 


GoX = G,X + G(X @ X) + D, Ku + D,(K @ X)u + Bu(t), (3.56) 


where X is the lifted state vector after more state variables are added to the state vec- 
tor x. Notice that in [36, 37], the system (3.56) is called quadratic-linear differential 
algebraic equation (QLDAE). However, the system above obviously includes the bi- 
linear term D,Xu and the quadratic—bilinear term D(x ® x)u. Therefore, the notion 
quadratic—bilinear differential algebraic equations (QBDAEs) introduced in [11] is 
used in this paper. 

Once the QBDAEs are derived after several steps of transformations, the varia- 
tional analysis (3.45)—(3.49) in the previous subsection can be applied to the QBDAEs. 
The projection matrix V can also be computed likewise. Then a Galerkin projection 
can be applied to (3.56) to get the reduced QBDAEs, which is considered as the ROM 
for the original nonlinear system in (3.28). 

Recall that, from the second variational equation (3.48), the input matrix has 
many vectors, which makes the computation of the projection matrix V, tricky. 


88 — P. Benner and L. Feng 


In [36, 37], a different way of computing the projection matrix V is proposed based on 
the transfer functions of the QBDAEs (3.56). The expression of the transfer functions 
of the QBDAEs can be originally found in [52]. For example, assuming for simplicity 
Go = I, the first two transfer functions are 


H,(s) = L” (sI - G,) ‘B, 
H2(S1, S2) = SL [ls + S2) - Gir (3.57) 
x {G-[H; (s1) ® H,(s>) + H,(S) ® H,(s,)] + D,[H,(s,) + Hp(s>)]}- 


Using Taylor series expansions of the transfer functions, the matrix V can be recur- 
sively computed from the coefficients of the series expansions. The series expansions 
of H, and H, about zero (adaptation to nonzero expansion points is straightforward) 
are given as 


foe) 
H,(s) = L" X ABE, 
k=0 


k (ore) fore) 
H,(5, S2) = su Y AM, + s3)" fe( X Ath ) @ ( Y ABs} ) 
r i=0 


k=0 


foe) foe) 
+ (È ata! o( Š atest) 
k=0 k=0 
oo oo 
en $ atisi +> asa}, 


k=0 k=0 
where A = G7’, B = -G7'B. In [36, 37], the projection matrix V is constructed as 


range(V;) = span{A'B, i < q}, 

range(V,) = span{A''D,A’B,i+j < q}, 

range(V;) = span{A''G,(A’B) e (A%B), i+j+k<q,k <j}, 

range(V) = span{V,, V2, V3}. (3.58) 


It can be seen that, if the system matrix B is a vector, the Kronecker product (AB) ® 
(A‘B) is also a vector so that the construction of V; is easy. In general, if B has m 
columns, (A’B) @ (AB) has m? columns. The number of columns in (AB) @ (A%B) is 
still moderate if m is small. This is an advantage over the way of computing V through 
variational analysis. 

Algorithm 3.8 can also be used to compute V in (3.58), where we need to let M = G,, 
Res B, R, SDV = G,A’V, & V>. Note that in order to compute V,, we use V; 
instead of A'B in R,, since V; is already the basis of the subspace spanned by A’B, j<q. 
Similarly, we use V,® V, rather than ABe A‘B. This way, we avoid the issues of how to 
choose proper values of j,k. When A’B is replaced by V, in R,, V; is a matrix instead of 
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a vector. However, usually there are a few columns in V}, which still keeps the column 
dimensions of R}, R moderate. 

In [11], the method is extended to two-sided projection based on the transfer func- 
tions (3.57) of the QBDAEs. It is proved that by using two-sided projection, the re- 
duced transfer function matches almost twice as many moments of the original trans- 
fer functions as with the one-sided projection used in [36, 37]. Simulation results also 
show better accuracy than the one-sided projection. However, the two-sided projec- 
tion sometimes causes numerical instability, which may produce unstable reduced 
models [11]. 

Note that the subspace dimension in (3.58) will grow exponentially if the coeffi- 
cients of the series expansion of the higher order transfer functions, e. g. H3(S,, S2, S3), 
are also included to compute the projection matrix V. This easily leads to a ROM with 
no reduced number of equations. In [58], the higher order multivariate transfer func- 
tions H,(s,,S), H3(s,,S>,53),..., are transformed to single-s transfer functions H,(s), 
H;(s),... by association of variables without losing accuracy. The series expansion of 
H,(s) or H3(s) only depends on the single variable s, such that the exponential growth 
of the subspace dimension can be avoided. Compared with the method in [37], a more 
compact ROM with the same accuracy can be obtained. The theory on association of 
variables can be found in [52]. 

Recall that, if the original nonlinear system is a system of ODEs, the QBDAEs usu- 
ally constitute a system of differential-algebraic equations after quadratic—bilineariza- 
tion, i. e., Go could be singular. In general, it is still unclear how to determine the index 
of the QBDAEs which may cause problems when the ROM is solved. 


3.4.3.3 Other variants 


Algorithm IRKA has been extended to the bilinear iterative rational Krylov algorithm 
(BIRKA) in [10] to compute the ROMs of bilinear systems, which iteratively updates 
a set of interpolation points such that the ROM satisfies the necessary conditions of 
H,-optimality. Upon convergence, the BIRKA method produces a ROM whose Volterra 
series interpolates that of the original bilinear system at the mirror images of the poles 
of the ROM. However, the computational cost of BIRKA for large-scale systems is high 
and it is also not possible to match higher-order derivatives. Regarding computational 
cost of BIRKA, efforts have been done in [12] to reduce the computational cost for some 
special systems. 


3.4.4 Trajectory piece-wise linear method 


The trajectory piece-wise linear method in [49, 50] is proposed to deal with strongly 
nonlinear systems. An error bound for this method is proposed in [51], where the 
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stability and passivity of the ROM are also discussed. The trajectory piece-wise linear 
method first linearizes the nonlinear function f(x(t)) at a number of linearization 
points x;, i = 0,1,2,...,k, then approximates f(x(t)) by the weighted sum of these 
linearizations, f(x;) + A;(x—x;). Finally, the original nonlinear system is approximated 
by the following weighted sum of linear systems: 


s-1 s-1 
AREN- X wofa) + X WA; - x;) + Bult), 
dt i=0 i=0 
y(t) = Cx(t). 


Once a projection matrix V is obtained, the ROM can be obtained using a Galerkin 
projection. In [49, 50], V is obtained by applying the moment-matching method to 
each linearized system. 

The linearization points are chosen by selecting a training input and simulating 
the original nonlinear system. The procedure is simply as follows: (1) A linearized 
model around state x; (initially i = 0) is generated. (2) The original nonlinear system 
is simulated while minogj<i |X — X;ll < 6, i.e. while the current state x is close enough 
to any of the previous linearization points. (3) A new linearization point x,,, is taken 
as the first state violating ||x — x;|| < ô, then return to step (1). Note that in order to 
get the linearization points, the original full system has to be simulated. Instead of 
simulating the full system, a fast algorithm for computing an approximate trajectory 
is also proposed in the paper. 

The weak point of this method is that training inputs have to be chosen. In gene- 
real, it is unclear how to choose the optimal training inputs so that the trajectory rep- 
resents the behavior of the state vector x(t). If the training inputs are chosen far away 
from the actual inputs, then the computed trajectory of the unknown vector will depart 
from the real behavior of the state vector x(t) and the ROM will lose accuracy. Compu- 
tation of the weight functions ŵ; in the above linear system is also more or less heuris- 
tic. Some related papers based on piece-wise linear ideas are [15, 18, 19, 20, 53, 54, 55]. 


3.5 Extension to parametric nonlinear systems 


Some of the above nonlinear MOR methods could be extended to deal with parametric 
nonlinear systems, though little relevant work has been done. Often the nonlinear 
system also involves parameter variations, i. e., 


dx(t,u) _ 
qg T FOC Wy) + BuO), (3.59) 


y(t, u) = C(u)x(t, u), 


E(u) 
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where u € R? is the vector of geometrical or physical parameters. When f(x(t, 4), p) 
is mildly nonlinear, many of the above introduced methods can be extended to solv- 
ing (3.59). 

The nonlinear system in (3.59) can be transformed to a parametric bilinear system 
using the same technique as introduced in Section 3.4.2, so the resulting system could 
be considered as a linear parametric system, where the input u(t) could be taken as 
an extra parameter. The PMOR-MM method can then be applied to obtain the ROM. 
Extension of both the quadratic method and the variational analysis approach intro- 
duced in Section 3.4.1 and Section 3.4.3 is straightforward. In the following, we discuss 
these extensions in more detail. 


3.5.1 Quadratic PMOR 


The quadratic method in Section 3.4.1 depends on the power series expansion of the 
nonlinear function f(x(t)). For parameter dependent f(x(t, u), 4), the corresponding 
power series expansion may be written as 


F(x(t, u), p) = £(0) + A Q)x(t, u) + AW (x(t, y) @ x(t, w)) 


+ As (u)(x(t, u) 9 X(t, u) 8 x(t,y)) +++ (3.60) 
The approximated quadratic system is 
dx(t, se. he 
By) AEP = asqorte,w + ADH © x(t, 1) + BODE, os 


y(t, y) = C(u)x(t, y), 


where B(u) = [B(u), £(0)], a(t) = (u(t)’, 1)’. We seek a projection matrix V, which is 
used to reduce the linear parametric system 


dx(t, p) 
dt 


E(u) = A wx(t, u) + Bült), 


y(t, u) = C(w)x(t, p). 


Once V is computed from the linear system in (3.62), the ROM is then obtained by 
applying a Galerkin projection with V to the quadratic system in (3.63), i. e. the ROM 
of (3.59) is 


(3.62) 


a y) 


V EVE = VTA WVZ, u) + VTA WV (z(t, p) @ z(t, p)) 


P V" BQWi(t), (3.63) 
y(t.) = C(u) V z(t, p). 


Since V is computed from (3.62), the PMOR-MM method can be directly used to 
compute V. PMOR-MM is shown to be accurate for MOR of quadratic parametric sys- 
tems [6, 26, 57], when the magnitude of the input signal is relatively small. The exten- 
sion to two-sided (Petrov-Galerkin) projection is straightforward. 
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3.5.2 PMOR after bilinearization 


Following the bilinearization method introduced in Section 3.4.2, the parametric non- 
linear system in (3.59) can be approximated by a parametric bilinear system as 


aE = Ag(WXe + Ne(WXeu(t) + Be(wu(t), 

Vit, 4) = Co(H)Xg. 
By considering u(t) associated with the bilinear term Ng(u)x,u(t) to be an extra pa- 
rameter, say u(t) = Mm41(t), the above system can be viewed as a linear parametric 
system, 


d 
ETE = Ag(WXo + No(WXoHmss(t) + Bo(UC) Sth 


Y(t, u) = Co(W)Xq. 


Note that the parameter y,,,,(t) is time-varying. Strictly speaking, the PMOR-MM 
method in Section 3.3 cannot be directly used, since the Laplace transform of (3.64) 
cannot be applied as in (3.15). However, it is found that directly applying PMOR-MM to 
some systems with time-varying parameters [24, 41] or to the bilinear system [2] may 
also produce accurate results. 


3.5.3 PMOR based on variational analysis 


The variational analysis method in Section 3.4.3 can easily be extended to deal with 
parametric nonlinear systems. It can be seen that from the power series expansion of 
f(x(t, u), p) in (3.60), one can obtain the parametric variational equations 


sa = A,(W)xX (t, u) + B)u(t), (3.65) 
a = A,(u)x,(t, p) + A, (4) (x(t, 4) @ x, (tw), (3.66) 
dx;(t, 

ae = Ay(y)X3(t, 4) + AG) (X(t H) © X(t, 4) + x(t, u) © X(t) 


+ A3(u)(X1(t, 4) 8 X(t, u) 8 x(t, Ww), (3.67) 


For each linear parametric system in (3.65)—(3.67), the PMOR-MM method can be used 
to compute the projection matrices V,, V>, V3 corresponding to (3.65)—(3.67), respec- 
tively. The final projection matrix V for the parametric nonlinear system is then the 
combination of V,, Vj, V3, and can be computed following (3.52). The ROM is of a form 
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similar to (3.54): 


dz(t, UL) 


ee V'A\Q)V z(t, u) + V'A W(V z(t, u) @ V2(t, u)) 


+ V'A (W)(VZz(t, u) @ Vz(t, u) @ Vz(t, w)) + V' BQu)u(t), 
y(t, u) = C(u)V z(t, y). 


3.6 Conclusions 


This chapter reviews moment-matching methods for MOR of a wide range of sys- 
tems, including standard linear time-invariant (LTI) systems, parametric LTI systems, 
nonlinear non-parametric and nonlinear parametric systems. Sufficient algorithms 
are provided to enable most of the methods to be realizable and the results in the 
literature to be reproducible. Some algorithms, e. g., Algorithms 3.1-3.2 and Algo- 
rithm 3.8 have not appeared elsewhere. The discussions in some sections, e. g. Sec- 
tions 3.3.2, 3.3.3, and 3.5 are also new. It has been demonstrated in numerous publica- 
tions that moment-matching methods are powerful MOR tools for many systems and 
are useful in many application areas. 
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Abstract: In this chapter, we present an overview of the so-called modal methods for 
reduced order modeling. The naming is loosely referring to techniques that aim at con- 
structing the reduced order basis for reduction without resorting to data, typically ob- 
tained by full order simulations. We focus primarily on linear and nonlinear mechan- 
ical systems stemming from a finite element discretization of the underlying strong 
form equations. The nonlinearity is of a geometric nature, i. e. due to redirection of in- 
ternal stresses due to large displacements. Intrusive vs non-intrusive techniques (i. e. 
requiring or not access to the finite element formulation to construct the reduced order 
model) are discussed, and an overview of the most popular methods is presented. 


Keywords: Galerkin projection, geometric nonlinearities, modal derivatives, non- 
intrusive reduction, nonlinear manifolds 
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4.1 Introduction 


Structural dynamics often relies on Finite Element (FE) models leading to large system 
of equations, which need to be integrated in time to predict time histories of displace- 
ments, strain and stresses. Even in the case of linear models, this task is often pro- 
hibitive in a design context, as the large size of the systems and the large number of 
simulations required result in excessive time and storage requirements. This situation 
is exacerbated by the presence of nonlinearities, which essentially implies the eval- 
uation and factorization of configuration-dependent residuals and Jacobians, and an 
iterative process for convergence at each time step. Clearly, model order reduction is 
a must in these cases. 

Historically, linear structural dynamics have been resorting to modal decomposi- 
tion since its dawn. Essentially, the displacement field is approximated as a superpo- 
sition of few eigenmodes of the system to achieve reduction. 

When nonlinearities are present, a plethora of phenomena might occur, namely 
a richer harmonic spectrum in the response than that of the applied forcing, internal 
resonances, bifurcation, etc. The challenge for a nonlinear reduced order model (NL- 
ROM) is to efficiently capture all these phenomena. As the type of nonlinearity strongly 
affects the response, it would make sense to use all the knowledge of the FE model, 
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usually referred to as the High Fidelity Model (HFM) to devise methods to construct 
the NLROM. This strategy, which we name here model-driven or system-driven, is in 
some sense opposed to a data-driven approach, which constructs the NLROM essen- 
tially from data coming from simulations of the HFM. This chapter attempts to give 
an overview to methods belonging to the former category, and focuses on structural 
systems featuring geometric nonlinearities. 

Essentially, a NLROM technique tackling such systems faces two challenges. First, 
the derivation of a compact reduction basis on which to project the HFM. Second, an 
efficient calculation of the projected nonlinear terms. As we will discuss, the nonlin- 
ear terms of a FE based HFM are never available in an explicit form at system level, 
and therefore their reduction and projection face some computational bottlenecks 
that need to be addressed. 

An often employed categorization of NLROMs distinguishes between intrusive and 
non-intrusive methods. Respectively, the construction of the NLROM requires, or not, 
the access to element level formulation, and FE assembly. Clearly, a non-intrusive 
technique is preferable when one has only the availability of a commercial FE pro- 
gram, which typically only release forces and, in some cases, Jacobians, at the assem- 
bled level. On the other hand, intrusive methods are typically more systematic and 
theoretically sound, at the price of the mentioned need to access codes at the element 
level. 

In this contribution, we discuss the main methods and point out recent develop- 
ments and trends. We first recap the main concepts applicable to linear systems, and 
then tackle the relevant cases of geometric nonlinearities. 


4.2 Linear systems 


Let us first consider the linear system resulting from a FE discretization of second- 
order ordinary differential equations describing an elastic continuum. This can be 
written 


Mu(t) + Cu(t) + Ku(t) = p(t), (4.1) 


where u € R” is the vector of nodal generalized displacements, M € R”” is the mass 
matrix, C € R”” is the damping matrix, K is the stiffness matrix and p(t) € R” is a 
forcing vector. The system (4.1) implies a linearization about an equilibrium point. For 
many structural dynamics applications, n is typically very large due to a required fine 
discretization. 

A well established reduction technique in linear structural dynamics is based on 
the so-called vibration modes, i.e. eigenvectors associated to the free undamped vi- 
bration problem, obtained by setting C = 0 and p(t) = 0 in (4.1), 


Mü(t) + Ku(t) = 0. (4.2) 
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The solution of (4.2) reads 
u(t) = pel. (4.3) 
When (4.3) is inserted in (4.2), we obtain the eigenvalue problem 
(K — w’M)@ = 0, (4.4) 


which admits eigenpairs wj, @;, j = 1,...,n as solutions. Physically, each mode @; 
represents the shape at which the system freely vibrates at a frequency wj. It can be 
easily shown that the vibration modes are orthogonal with respect to the mass and the 
stiffness matrices. Compactly, we can write 


p Mg, =w pj Kp, = w ôi (4.5) 


where ôik = 1forj = k, and zero otherwise. In the above, we also use a so called 
mass normalization for the vibration mode, i.e. a scaling $; that guarantees that 
pj Mø; =1,j = 1,...,n. Since the eigenmodes of the system constitute an orthonor- 
mal basis spanning IR", we can operate the following coordinate transformation, 
known as Mode Superposition: 


n 
u(t) = È pratt) = Balt), (4.6) 
k=1 
where ® = [@,,..., @,,] isa matrix containing all the eigenmodes and q(t) = [q,(¢), ..., 
q,(t)|" is the vector of modal coordinates. By substituting (4.6) in (4.1) and projecting 
onto ®, we obtain 


D Mag + ©’ Coq + © Kad = Dp. (4.7) 


It is often the case, in structural dynamics applications, that the damping matrix C is 
given as linear combination of K and M, 


C = aK + BM, (4.8) 


where a and £ are user defined coefficients. This is done for two reasons. First, the ac- 
tual damping is due to several mechanisms (joint friction, internal heating, interaction 
with surrounding fluid, etc.) which cannot be physically described by a simple viscous 
term as in (4.1), but rather by much more complex, nonlinear laws. The term Cu(t) is 
then designed to provide a velocity dependent linear term in (4.1) by globally repre- 
senting the dissipation occurring during motion. Second, the form (4.8), also known 
as Rayleigh Damping, allows one to make use of the orthogonality property (4.5), and 
yields a decoupled set of equations. By virtue of mode orthogonality and mass nor- 
malization of modes, (4.7) becomes 


q+Dq+07q = 9, (4.9) 
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where © = CA p.... „p7 p]’ € R” is the load participation vector, 
Q= a (4.10) 


is a diagonal matrix containing the eigenvalues ws, j=1,...nalong its diagonal, and 


2510 
D = ®'Co@ = D” (aK + BM)® = ig l (4.11) 
2nWn 
with the damping ratios é; found as 
1 B 
gj = 5(aw, + £) (4.12) 


J 


Equation (4.9) indicates that the response of a damped linear system to a given ex- 
citation can be computed from a set of n uncoupled single DOF equations. Each of 
these uncoupled equations can be separately integrated using for instance the Laplace 
transform and convolution products (see [18]). However, the computation of all eigen- 
modes for a large dynamical system with several DOFs is computationally expensive 
in practice. It is anyway possible to approximate the response of the system using only 
few modes in (4.6), as explained in the next section. 


Modal displacement method 
To reduce the computational costs for obtaining the response of a system using mode 
superposition, one could only include the modes in (4.6) which effectively participate 
in the response. This technique is known as Modal Displacement Method (MDM). If this 
strategy holds, not only just a few eigenmodes of the system have to be calculated, but 
also the number of uncoupled modal equations that need to be solved is significantly 
reduced with respect to the size n of the original problem (4.1). 

Let us first decompose the applied external load to a spatial distribution vector 
and a time-varying function as 


p = Pogo), (4.13) 


where Py € R”? is a matrix containing p spatial distributions of the applied load and 
g(t) € R” represents their time-varying functions. According to the MDM, the motion 
of the system (4.1) can be approximated as a superposition of truncated number of 
modes by 


u(t) = YP Gut) = ÈH), (4.14) 
k=1 
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where ® € R™"" is a matrix containing the truncated set of m < n kept eigenmodes of 
the system, and q(t) are the corresponding modal amplitudes. By substituting (4.14) 
in (4.1) and projecting onto ®, the reduced set of modal equations reads 


@+Dq+ 0'q = ČP g(t) = Bogle), (4.15) 


where D = © D@ and @ = ®'K@. Once the reduced system in (4.15) is integrated, 
the approximate displacement ŭ is recovered as 


a(t) =) PaO. (4.16) 
k=1 


The MDM can significantly reduce the computational cost of time integration. Typ- 
ically, the eigenmodes that are used for the reduction are those whose frequencies 
spans the frequency content of g(t). While representing the spectral content of g(t), 
one should also strive for an accurate projection of the spatial distribution of the 
load P,. If this is not guaranteed, a significant error can arise. To quantify this error, 
let us define the residual spatial distribution vector, P,, as the portion of the load 
distribution vector Pp, which is neglected after projecting it on the truncation modes 
basis. This residual can be obtained: 


P,=P,)-P,, (4.17) 
where 
P, = Me’ P,; (4.18) 


see [14] for details of the derivation. Note that, as m increases, oo" > oo! = M! 
and therefore P, — 0, but the required computational effort increases as well. In order 
to keep m low and compensate for the effect of modal truncation onto the load, two 
main methods are outlined here. 


Mode acceleration correction 

The MDM is based on the assumption that the reduction basis spans the whole fre- 
quency content of the applied load. Therefore, the modes that are left out would con- 
tribute to the solution in a quasi-static manner, i. e. without changing the velocity and 
the acceleration of the full response appreciably. The Mode Acceleration Correction 
(MAC) method computes the quasi-static effect of truncated modes and augments it to 
the displacement response obtained from reduced system using the Modal Displace- 
ment method. This is done as follows. From (4.1) we can write 


Ku = p - Mü - Cu. (4.19) 
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Since the acceleration of the system is assumed not to be affected by modal reduction, 
we substitute (4.16) into (4.19) and solve it for the displacement, giving 


u~ a K’p- Y PAO _ S Pr (4.20) 
ki “k ki “k 


The acceleration of the kth generalized coordinate can be calculated from (4.9) as 
är = PKP - wk ak- EW: (4.21) 


Equation (4.21) is then introduced to (4.20), and rearranged to introduce the final form 
of the displacement: 


m m T 
ü= Y bya + G -5 PPr Jp (4.22) 
kal ka “k 
The last term in (4.22) represents a quasi-static correction of the displacement using 
MAC. In other words, MAC represents a statically complete alternative for the MDM 
by augmenting the static response of the deleted modes to the system. Note that the 
missing modes in the truncation are never computed; simply, their static contribution 
is added a posteriori. 


Modal truncation augmentation 

The inaccuracies of the applied load’s spatial distribution, caused by mode displace- 
ment projection, can be also improved by adding few correction vectors to the reduc- 
tion basis. This technique goes under the name of Modal Truncation Augmentation 
(MTA) [15]. Let us first compute the static response of the system due to the residual 
spatial forces, which is simply 


KX =P,, (4.23) 


where X is a matrix collecting Ritz vectors columnwise. This matrix is then employed 
to reduce the mass and stiffness matrices of the system as 


M=X'MX, K=xX/KX. (4.24) 


Here, M and K are the reduced mass and stiffness matrices projected onto the space 
spanned by the columns of X. In the next step, an eigenvalue problem of the form 


(R - w,M)w;, = 0 (4.25) 


is solved to obtain the eigenvectors w,. Finally, the modal truncation augmentation 
vectors are found by projecting the Ritz vectors onto the space spanned by the eigen- 
vectors calculated from (4.25). This is simply 


D = XW, (4.26) 
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where W = [w,... , Wp] is a matrix containing the eigenvectors obtained from (4.25) 
in its columns. The final step is to append the modal truncation augmentation vectors 


to the reduction basis of truncated eigenmodes as 
Y=(® Pnl (4.27) 


where ¥ is the enrichment to the basis of mode displacement method. From here, 
the MTA procedure follows the MDM, namely, the matrix ¥ can be used instead of 
@ in (4.14) to reduce the size of the system to also take into account the quasi-static 
contribution of the truncated mode in the system. 

Once the response of the generalized coordinates q is obtained the physical dis- 
placement can be retrieved as 


a = YÀ. (4.28) 
Likewise, the physical acceleration is 
= Yå. (4.29) 


As can be seen from (4.29), the effect of truncated modes is also reflected on the 
acceleration of the system through the basis WV. This is different from the MAC method, 
which attempts only to correct for the displacement of the mode displacement reduced 
system in a strictly static manner. 


4.3 Substructuring and component mode synthesis 


Before discussing nonlinear problems, it is worth to briefly mention here another prac- 
tical strategy for model reduction, namely substructuring. A thorough discussion of 
this topic is beyond the scope of this chapter, and the interested reader should re- 
fer to [1] for a comprehensive overview. The reduction methods discussed in this con- 
tribution tackle the whole high fidelity model monolithically, meaning that only one 
reduction basis is constructed to reduce the full solution. It is possible, however, to de- 
compose the system into substructures, which communicate with each other through 
common interfaces. Then each substructure could be reduced independently from the 
others, while preserving the interface DOFs to allow for assembly. This approach was 
originated by the seminal work of Hurty in [25] and then later developed by Craig and 
Bampton in [12]. 

The division of the system into substructures is often dictated by the practical 
reason of dealing with the detailed design and the analysis of single components in- 
dependently. The assembled model will then feature common interface DOFs and few, 
reduced generalized coordinates relative to each substructure, thus achieving model 
order reduction. All techniques aiming at reducing the substructures are based on 
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what discussed in the previous section. However, significant differences will arise de- 
pending on how the interface compatibility across substructures is enforced, namely 
whether a primal or dual assembly is adopted. In the former, common interface de- 
grees of freedom are eliminated, and therefore the interface forces are not present in 
the assembled system. In the latter case, interface forces appear as Lagrange’s multi- 
pliers, and thus compatibility is not a priori satisfied. For primal assembly, the Hurty- 
Graig-Bampton method [12, 25] achieves reduction for each substructure with a ba- 
sis formed by static modes obtained by applying unit displacements at the interface, 
and vibration modes relative to the substructure clamped at the interface. In the case 
of dual assembly, it is natural to resort to the method of McNeal [42] and Rubin [56], 
which construct a reduction basis using vibration modes of the free substructure com- 
plemented with so-called attachment modes obtained by applying unit forces at the 
interface DOFs. 

In case of several substructures featuring interfaces with many DOFs, the resulting 
assembled reduced order model could still be of large size. Then a second reduction 
could be performed on the interface DOFs, which could be applied before assembly 
to each component independently, or on the global interface after assembly. A recent 
review of the available methods can be found in [39]. 


4.4 Geometrically nonlinear structural dynamics 


When the displacements about the equilibrium point cannot be considered as small, 
nonlinear geometrical effects might arise and (4.1) loses its validity. In particular, inter- 
nal stresses due to deformations are rotated with respect to the reference, undeformed 
configuration. Typical engineering applications prone to geometric nonlinearities are 
thin-walled structures, where low-frequency and bending-dominated modes cause in- 
plane stretching when the displacements are in the order of the thickness of the struc- 
ture. Then the system (4.1) is modified as 


Mii(t) + Cu(¢) + £(u(t)) = pid), (4.30) 


where f : R” — R” is the function of internal nodal forces. The nonlinearity here 
included can take various forms, depending on the kinematic and material model 
adopted. For instance, for linear elastic constitutive law and Green—Lagrange strain 
tensor, the resulting internal forces are linear, quadratic and cubic in u [43]. The same 
holds for the approximate kinematic model due to von Karman, which captures the 
behavior of beams and plates undergoing out-of-plane deflections in the order of one 
thickness of the structure [54]. Other kinematic descriptions, i.e. obtained by means 
of the co-rotational formulation, lead to more complex forms of f(u) [10, 13]. 
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4.4.1 Galerkin projection 


In order to achieve a low order version of (4.30), the generalized displacement vector 
u(t) can be approximated by a linear combination of vectors, contained in the columns 
of the matrix V, as 


u(t) = Vq(t) (4.31) 


where q(t) are the generalized modal amplitudes, unknowns of the Nonlinear Re- 
duced Order Model (NLROM). By substituting (4.31) in equation (4.30), we get 


MV q(t) + CVq(t) + f(Vq(t)) - p(t) = r(t), (4.32) 


where r(t) is the residual resulting from the approximation. We can enforce this resid- 
ual to be orthogonal to the reduction subspace, by requiring that 


v'r=0. (4.33) 
This implies 
Mq(t) + Cq(t) + Vat) = BIO), (4.34) 
where 
M=V'MV, C=V'CV, p(t)=V'p(t) (4.35) 
and 
f(t) = V'£(Vq(t)). (4.36) 


The above technique is known as Galerkin projection, according to which the sub- 
spaces on which the solution is sought and on which the residual is projected coincide. 
This preserves the symmetry—and thus stability—properties of the structural dynam- 
ics equations. In other cases, when symmetry is not featured in the full model, the 
right and left subspaces do not have to be equal. This latter procedure goes under the 
name of Petrov—Galerkin projection. Note that the reduced operators M, C and p(t) 
can be pre-computed offline before time-integrating the ROM. The nonlinear reduced 
force f, however, still requires evaluation over the whole mesh, and hinders any signif- 
icant speed-up that may be achieved from the ROM. In short, an accurate and efficient 
reduction relies on 

1. agood choice of V, and 

2. an efficient computation of f. 


There is a variety of methods that aim at constructing a proper basis V. In general, V 
should fulfill the following, in some sense related, features: 
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1. Span the HFM solution subspace. The choice of the reduction basis vectors is usu- 
ally based on the expected spectrum of the response with respect to the spectrum 
of the excitation. While in the linear case, because of the superposition principle, 
this is an easily fulfilled requirement, nonlinear systems pose several challenges 
related to the raise of sub and super-harmonics in the response. 

2. Can be cheaply computed, in terms of computational time. 

3. Can handle different cases. It is for instance desirable to construct a NLROM able 
to provide accurate responses for different forcing amplitudes and frequencies, or 
different material and geometric parameters. 


4.4.1.1 Evaluation of nonlinear terms 


For FE applications, the nonlinear forces are evaluated as follows: 
f(q(t)) = 3 Vofe(Vea(s)) (4.37) 


where f,(u,) € IR is the contribution of the element e for the vector f(u) (N, being 
the number of DOFs for the element e), V, is the restriction of V to the rows indexed 
by the DOFs corresponding to element e, and n, is the total number of elements of the 
mesh. Since the reduced nonlinear term f(q(t)) is evaluated in the space of full vari- 
ables, the computational cost associated to its evaluation does not scale with m alone. 
Indeed, (4.37) shows that this cost scales linearly with the number of elements in the 
structure, and can hence be high for large systems. Thus, despite the reduction in di- 
mensionality achieved in (4.34), the evaluation of f(q(t)) hinders any fast prediction 
of system response using the NLROM. Different strategies are available to overcome 
this problem. 


4.4.1.2 Exact reduced forces 


In the case of polynomial nonlinearities, as the one arising from von Karman kine- 
matic model of linear elastic bodies, or solid continuum FE described in a total La- 
grangian framework, the ith component of the HFM nonlinear force f(u) can be written 
as 


fi= Ki juj + By; Uk + Ciqu; UU), i,j, kl =1,...,n, (4.38) 


where K;ij, Bj, and Cj; are constant tensors of second, third and fourth order, respec- 
tively. Note that the form (4.38) is never available as written, i.e. the tensors B; and 
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Cija are typically never assembled within a FE model, while K; is the linear stiffness 
matrix. When a reduction u = Vq is inserted in (4.38) and a projection on V is per- 
formed, then f; can be expressed exactly by 


f= Kyu; + Bij Ik + Cit Ue ijkl=1,...,m. (4.39) 


Once (4.39) is available, no connection to the HFM is needed any longer. For details of 


the computation of the reduced tensors Kj, Bix and Coin see for instance [66]. 


4.4.1.3 Approximate reduced forces 


When the nonlinearity is not polynomial, for instance because of a nonlinear material 

law, or geometric nonlinearities described by a co-rotational formulation [13], one is 

facing essentially two possibilities: 

1. assuming a certain form of f(q) and identifying its coefficients, or 

2. trying to compute f(q) affordably, but still querying the underlying HFM mesh. 
This strategy is addressed by the so-called hyper-reduction methods, which are 
discussed, at least to some extent, in Section 4.4.1.4. 


For option 1., in the case of NLROMs addressing geometrically nonlinear problems, a 
typical choice is 


f= KPa; + Raid + KGAA i,j,k,l=1,2,...,m, (4.40) 


where KO, kO iik and K a are two, three and four dimensional matrices, respectively. 
Note the formal paunalenes between (4.40) and (4.39). However, (4.40) is simply a 
pre-defined, convenient model of something which is essentially unknown. We will 
discuss in the following that this form is typically used in non-intrusive methods, in 
which the access to element information (which would enable (4.39)) is not possible. 
The problem then shifts to designing an efficient identification technique for the coef- 


ficients Ki a and K ae as the reduced stiffness matrix ki 0 is typically available. 


4.4.1.4 Hyper-reduction 


In the cases described before, the nonlinear term is written directly with respect to 
the modal reduced coordinates, in order to avoid any element level function evalua- 
tions during the numerical time integration of the NLROM. Another option is possible, 
which aims at scaling the computation of the nonlinear reduced terms with the size of 
the reduced coordinate vector m, rather than with n (size of the HFM). This is achiev- 
able only if the nonlinear terms are evaluated in a sparse manner, and if the missing 
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contributions are in some sense compensated for. Methods pursuing this strategy go 
under the name of hyper-reduction, meaning with this that they introduce a further 
reduction upon the one arising from projection. Among the various proposals, two 
methods (and numerous variants) are gaining popularity, namely the Discrete Empir- 
ical Interpolation Method (DEIM) [11] and the Energy Conserving Sampling and Weight- 
ing (ECSW) [16]. These two methods are briefly outlined now. 


ECSW 
In ECSW, the nonlinear projected terms are approximated as 


Ne 


FO = D Vefe(Vealt)) = D eVe felve), (4.41) 
e=1 ecE 

where é, € R* are positive weights, and |E| < nę. In other words, (4.41) weights the 
element forces of the element set E in order to match the work done by the internal 
forces onto the displacements induced by the reduction basis V. This procedure shows 
striking similarities with Gauss quadrature, where a defined integral is approximated 
by the evaluation of the integrand function at specific points, and weighted with pre- 
defined coefficients. In ECSW, the weights are determined by matching the work done 
by the nonlinear forces onto the reduction basis for a set of n, sampled training forces. 
This translates into the minimization problem 


§:arg_ min |G# -blz (4.42) 
EeR" 20 
where 
Su Sin, 
G= : n : (4.43) 
Sn, Snn, 
and 
r Ps v Ne 
Sie = Vafe(Veq®), b; =£(q") = È Sie (4.44) 
e=1 


where i = 1,...,m, e = 1,..., Ne. A sparse solution to (4.42) returns a sparse set of el- 
ements E : {e : č > 0}. An optimally sparse solution to (4.42) can be obtained by 
using a greedy-approach-based algorithm [50]. For the sake of compactness, this al- 
gorithm is not reported here. As an illustrative example, Figure 4.1 shows the ECSW 
hyper-reduction of the dynamic response of a slightly curved panel, subjected to a by- 
harmonic loading (for details, refer to [29]). The ECSW selects only 19 elements out of a 
mesh of 400, thus decreasing the evaluation cost of the nonlinear terms significantly. 
The resulting speed-up (ratio between the computing time of the full and reduced so- 
lution) is about 50 in this example. 
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Figure 4.1: (Hyper-)reduction on a simply supported slightly curved panel subject to a time-varying 
load; see [29] for the details. On the left, the full response (thick black line) is compared to two 
Proper Orthogonal Decomposition (POD) reductions with 5 and 4 modes in the reduction basis, re- 
spectively for POD-1 and POD-2, and to the ECSW response. On the right, corresponding elements 
and their weights selected by the ECSW. In this case, 19 elements (out of 400) are picked. For a brief 
discussion of the POD method, see Section 4.5.4. 


DEIM 
The idea underlying the DEIM is somewhat different. Here, the nonlinear force f is 


approximated as a superposition of few force modes collected into the matrix U € 
Rem as 


f= Uc, (4.45) 


where c € R” are unknown factors. Then a Boolean matrix P selects m rows of (4.45), 
so that the factors c can be found as 


P'f = P'Uc > c= (P'U) P'E. (4.46) 
The force vector f can be then approximated as 
f = U(PTU) PTE (4.47) 
and therefore the reduced forces are 
f = V'U(PTU) (P"f). (4.48) 


Note that, in the above, the term P’f is grouped within brackets, to highlight that 
only the components of f picked by P have to be computed. Moreover, the matrix 
vu’ Uy! can be pre-computed offline once for all. The issue then shifts to find- 
ing U and P. Typically, U is formed by the left vectors of a Single Value Decomposition 
(SVD) of sampled training forces, i. e. 


UEW' =F, (4.49) 
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where F = [f,,...,f,,] contains columnwise all the sampled n, forces, while P is se- 
lected via a greedy procedure; see [11] for details. 

While ECSW and DEIM aim both at reducing the evaluation cost of the nonlinear 
terms, they differ significantly. ECSW operates directly on elements, while DEIM picks 
DOFs. This feature could lead to a decrease of efficiency for DEIM, as the computa- 
tion of a specific component of f requires the evaluation of all the elements connected 
to such a DOF. A workaround has been proposed in [65], where DEIM is applied on 
the unassembled mesh. In this manner, each DOF picked by the algorithm maps to 
one element only. To counteract the increased cost of the SVD of training forces (now 
unassembled), it is shown that a surrogate quantity per element is sufficient to re- 
produce the nonlinear costribution. This variant goes under the name of Surrogate 
Unassembled DEIM (SU-DEIM). Note that both methods rely on training data from the 
HFM. As such, the hyper-reduced NLROM is optimal with respect to the training set 
used for its generation. At the same time, there is no guaranty that the same NLROM 
provides an accurate response for a set of parameter values (i. e. load, material prop- 
erties, etc.) far from those related to the training set. For this reason, hyper-reduction 
is typically employed to generate NLROM which are local with respect to parameter 
values. Then there exist techniques that interpolate such NLROM over the domain of 
the parameters. This topic is the subject of other contributions in this book. In any 
case, we outline the most common strategies in Section 4.6. 


4.5 Reduction methods for geometrically nonlinear 
systems 


As briefly discussed above, geometric nonlinearities induce stretching for bending 
and twisting deformations. Think of the beam, pinned at both ends (i. e. the extrem- 
ities of the beam cannot move, both axially and transverse) depicted in Figure 4.2. If 
it is bent by a pressure load, the bending will cause axial stretching. This effect is not 
present when the displacements are assumed infinitesimal. As such, a good reduction 
basis should also include vectors able to produce such effects in the reduced solution. 
This fact is exemplified when considering the discretized equation of a flat structure, 
made of linear elastic and isotropic material, as discussed in the next section. 


4.5.1 Static condensation 


For thin-walled structures excited in the nonlinear range, one can argue that the dom- 
inant behavior is still represented by the low-frequency dynamics spanned by bend- 
ing/twisting vibration modes. Also, the eigenfrequency associated to axially domi- 
nated modes is typically of much higher frequency, as the axial to bending stiffness 
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ee 


Figure 4.2: Illustration of the bending—stretching coupling due to geometric nonlinearity for a planar 
beam. An out-of-plane displacement w(x, t) causes the beam to stretch, and induce an in-plane 
displacement field v(x, t). 


ratio scales with h”*, h being the thickness of the component. As such, axial modes 
are likely not to be excited dynamically, but rather have a nonlinear contribution to 
the overall displacement. In many cases indeed, it is possible to identify slow and fast 
DOFs and have an effective reduction of the problem size. As an example, consider 
here the discretized FE equations for an Euler—-Bernoulli beam modeled with von Kar- 
man kinematics. The equations of motion read 


Mw 90 ][Ww] [K,, 90 ]fw] [.f,(w,v) 3f,,(W, w, ») E lA 


where v and w denote the in-planes (axial) and the out-of-plane (bending) displace- 
ments. The order of the nonlinear forces 5f,,, 2f, and 3f,, is denoted by the left sub- 
script. As is well known, the in-plane and out-of-plane dynamics is not coupled 
through the linear operators (M,,, = Mj, = 0, Kw = K/,, = 0). Moreover, for many 
structural applications, the load is applied only in the transverse direction, i.e. p, = 0. 
For slender structures like the beam under consideration, the in-plane dynamics is 
characterized by much higher frequencies than the bending dynamics. This allows 
one to neglect the inertial term of the in-plane block, and define v as a function of w 
as 


v=0>v=-K)f,(w,w). (4.51) 


In other words, the in-plane DOFs v are a quadratic function of the out-of-plane gen- 
eralized DOFs w. (4.51) is usually referred to as static condensation. When (4.51) is 
inserted into (4.50) one obtains 


M,,,,W + KW +f (W, Kf, (w, W)) + 3f,,(W, W, w) = py. (4.52) 


Note that the static condensation of v goes into forming a third-order term which 
“corrects” the cubic stiffness term 3f,,. This physically corresponds to allow the beam 
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to properly compensate for the correct in-plane displacements arising from finite dis- 
placements. Clearly, this simple procedure is only effective when one could identify 
the slow and fast DOFs from the full order model—something possible only in the 
case of simple geometries [30]. For more complex thin-walled structural components, 
as curved stiffened panels, the dynamics is still characterized by some low-frequency 
modes, which statically trigger geometrically nonlinear coupling effects. In these 
cases, however, it is in general not possible to detect slow and fast DOFs directly from 
the governing discretized equation. More general methods to tackle these cases are 
discussed next. 


4.5.2 Modal derivatives 


Note that equation (4.51) implies a nonlinear (in this case quadratic) constraint be- 
tween master (i. e. kept) coordinates w and slave (i. e. condensed) coordinates v. On 
the contrary, what proposed in Section 4.4.1 relies on a linear combination of modes 
over the entire set of DOFs, without partitioning them into slow and fast, which can be 
difficult for complex geometries. As already discussed, the reduction basis V should 
span the solution accurately. For the cases we are tackling, this would mean to prop- 
erly represent the stretching induced by the out-of-plane dominated modes. One could 
then think of introducing in V high-frequency axial modes to enrich the subspace. 
While straightforward, this strategy presents two issues, namely: (i) ideally, all modes 
of the system need to be computed, this is computationally very intensive for large 
systems, (ii) there is no criterion to select the proper vibration modes to reproduce the 
nonlinear coupling effectively. Typically, a reduction basis of large size results from 
this approach; see for instance [19]. 

One alternative way to achieve reduction and properly account for nonlinear 
stretching effects is through the use of Modal Derivatives (MDs) in the reduction basis. 

MDs were originally proposed in [26] and [27], and more recently in [67]. Essen- 
tially, the MDs are modes shapes stemming from a pre-selected basis of vibration 
modes, assumed to accurately represent the dynamics of the underlying linearized 
system. The MDs reproduce the most important deformation shapes resulting from 
finite deflections in the direction of the dominant vibration modes. As such, they 
complement the set of vibration modes initially selected in the MDM. At first, MDs 
are computed by differentiating the eigenvalue problem (4.4) with respect to modal 
amplitudes, as 


og, 2 
(K - uM) a= Ow” vy M) = 0. (4.53) 


ddj ddj ddj 0q; 


where 0; = pa is the “MD of vibration mode @; with respect to mode @,”. The above 
J 
definition requires the computation of the derivative of the eigenfrequency. Moreover, 
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the system (4.53) is singular and requires special treatment to be solved; see [31] for a 
detailed discussion. Regardless of the method adopted to solve (4.53), a high dimen- 
sional matrix needs to be factorized for each MD, and this leads to high computational 
cost. A computationally cheaper and more theoretically sound definition of MDs re- 
sort to a static version of (4.53), where all the mass related contributions are ignored; 
see [67] and [69] for details. When the inertial terms are ignored, (4.53) becomes 
Kori _ OK 


-Ċo.. (4.54) 
aq, * aq h; 


Note that the coefħcient matrix K is already available and therefore each MDs re- 
quires only a new right-hand side. MDs computed according to (4.54) are usually re- 
ferred to as static MDs (SMDs). It can be proven that SMDs are symmetric, i. e. 


op; s og; 
ðq;  dqı` 


(4.55) 


The first five vibration modes and the corresponding MDs for a flat rectangular plate 
are shown in Figures 4.3 and 4.4, taken from [31]. Note that, while the vibration modes 
(associated with the lowest eigenfrequencies) feature out-of-plane displacements 
only, the MDs are in-plane, and represent the displacement field that one has to add 
to a given vibration mode to account for the geometric nonlinearity. This separation 
is of course due to the flatness of the system. For more complex systems for which it 
is not possible to distinguish between in-plane and out-of-plane DOFs, however, the 
interpretation of MDs still holds. For instance, one could look at [62] for an application 
of MDs to the case of a shallow arch structure. 


Qı Qz 


Figure 4.3: First three vibration modes of a rectangular plate, simply supported at its short edges 
[31]. Note that all modes feature out-of-plane displacement components only. 


Once a set of m dominant vibration modes is selected, and the corresponding MDs are 
computed, the reduction basis V can be formed as 


V= lPi. Di. Pm bir- -> 8i -> Ommb (4.56) 


with i = 1,...,m,j =i,...,m. Due to the symmetry property (4.55), the number of MDs 
that can be generated is m(m + 1)/2. Therefore, the size of the reduction basis grows 
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Figure 4.4: Modal derivatives corresponding to the first five vibration modes depicted in Figure 4.3 
[31]. Due to the flatness of the structure, the MDs feature in-plane displacements only, and bring the 
displacement field necessary to account for the geometric nonlinearity. 


with m?. However, there are heuristic criteria to select the most relevant MDs and re- 
duce the associated NLROM size significantly; see [64] and [31]. Note also that the com- 
putation of the MDs requires the Hessian of the stiffness matrix in the direction of the 
vibration modes. In case the element formulation is accessible, this can be computed 
at element level analytically and then assembled. This would then classify the MDs 
method as intrusive. However, it is also possible to compute MD through a finite differ- 
ence procedure (and therefore non-intrusively) only by using FE assembly level infor- 
mation typically made available in commercial codes, as outlined in [61]. As a note, 
MDs have been successfully used for flexible multibody applications, where nonlinear 
elastic deflections are superimposed to large rigid body motions [68, 69]. 


4.5.3 Dual modes 


Another way to account for the transverse-membrane coupling is by appending the so 
called Dual Modes (DMs) to the dominant linear vibration modes in the reduction ba- 
sis [38]. The main idea of DMs is based on applying a series of nonlinear static forces 
to the HFM to obtain related nonlinear displacements. These displacements are then 
orthogonalized with respect to the linear vibration modes, to yield shapes that repre- 
sent the nonlinear contribution. The most common procedure for computing the DMs 
is described in [47, 52] and it is here briefly outlined. 
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We start from a set of mass normalized vibration modes Vyy = [Q;...;Py], 
m <n, typically selected by the MDM (see Section 4.2). The procedure then selects 
a dominant vibration mode @,, where d € M is the index relative to the mode with 
the maximum load participation factor, i.e. 


d = argmax(|p’ @;|), (4.57) 
ieM 


where p is the external force in (4.30). Then we form a set of scaling factors a = 
[a,,...,a,], a € R” such that the static solutions x; of 


f(x;) = aK, p 0 (4.58) 


trigger displacements ranging from linear to strongly nonlinear. That is to say, 
O(||£(X;) — Kx;l|) < O(||IKx;|]) for small |a;| and O(||£(x;) — Kx;ll) = O(||Kx;||) for large |a;l, 
respectively. Let R = [1,...,7] be the set of indices associated to a. It is now possible 
to create load sets, 


G = diaga k( ae je M, (4.59) 


and let g’ be the ith column of G®.. Now, for each j € M: 
1. Solve the static problems 


(x®)-g =0, ieR, (4.60) 


l l 


and collect the obtained solutions in the matrix X”) as 
x = [x®, ae x), (4.61) 


Y 


2. Mass-orthogonalize each column of X® to Vym: i.e. 


E s m s 
x!) = x - Y (Pix?) IER, (4.62) 


i 
and collect them into x” as 
x? = iy so x); (4.63) 
This step is performed to exclude the linear contribution from the static solutions 


Q) 
Xj 


3. Perform an SVD of x, and retain the singular vectors associated to the lowest 
k < r singular values, as 


PY sR = svDd(x%), (4.64) 
py) 4 (i) 30 
yp” - [Wy pe | 10, > Oy > ++ < OK > Oky > ++ Ops (4.65) 


where g; is the ith diagonal component of S®. 
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4. Compute the linearized strain energy measures E® of x? associated to each col- 


umn of wy? (i. e. to each dual mode candidate), as 


G) 
-G KYA, gP- ord (4.66) 


which are collected in a vector E” = [E O ee ). 
5. Select the most relevant s < k DMs yp , l=1,...,8 as the columns of P” associ- 
ated to the largest entries of E®, i.e. 


EÔ > EÙ gais E®, (4.67) 


G) (i) 
and let Vix, = (py... -ppl 
6. Move to the next load set: j — j+ 1. 


The total collection of selected DMs is then Vpy = [vO he ve was v], The reduc- 
tion basis V is formed as 


V eS [Vym Vom. (4.68) 


Note that the computation of the DMs is intrinsically non-intrusive, as only nonlinear 
static solutions of the HFM are required. Also, the procedure just described is load- 
dependent, meaning that the information of the shape of the external load exerted is 
needed. Typically, the computation of DMs requires a careful selection of the scaling 
factors a to trigger the nonlinearity at the desired value. 


4.5.4 Proper orthogonal decomposition 


As alternative to model-based techniques, a reduction basis for Galerkin projection 
can be from data obtained by sampling HFM solutions. The most popular of such 
method is the Proper Orthogonal Decomposition (POD) [2, 35, 36, 41], also known as 
Karhunen-—Loeve decomposition or principal component analysis (PCA). This topic is 
extensively treated in [8, Chapter 2]. Here we just discuss the main idea. 

Let us assume to have collected n, time snapshots of a HFM solution in a matrix 
Ue RU := [u,...,U,,]. The POD seeks for a lower dimensional representation of 
U by a basis V € R™”, V = [v,,...,V,,], m « n; by solving the following least squares 
problem: 


Ne 
min 


. n 
v,;eR jal 


m 


uj - > (uj viv; 


i=1 


(4.69) 


2 
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It can be shown that the vectors v; are the left singular vectors of U, obtained by the 
Singular Value Decomposition (SVD) of U as 


U =LSR’, (4.70) 


where L = [l,,...,1,,],R = [t,....t,], LL’ = I € R™", RR? = I ¢ R"®™ and Sisa 
rectangular matrix containing the singular values on the diagonal, and zero otherwise. 
For details on algorithms to compute SVD, refer for instance to [23]. These singular 
values represent the relative importance of corresponding eigenvectors of Lin forming 
the basis V. If the singular values S;;, and the corresponding left and right vectors l; and 
r; are sorted in a decreasing order, i. e. S4 > Sy) > +++, Sn, it can be shown that the error 
norm is bounded, as 


Nt 


> 


j=l 


m 


T 
uj- > (uj) 1 


i=1 


2 
Ne 
= È Sẹ (4.71) 


2 i=m+1 


In other words, the left singular vectors l, ... , l corresponding to the highest singu- 
lar values are the most significant for constructing a reduction basis. In Figure 4.5, we 
show an example of a POD analysis for the nonlinear dynamic response of a plate. 
Note that, in this context, the left vectors r; represent the (normalized) time evolu- 
tion of the corresponding left singular vector. Since an SVD can be performed for any 
set of snapshots of any general nonlinear problem, POD is widespread as a versatile 
method to form a reduction basis. It should be noted, however, that the POD basis is 
optimal only for the set of snapshots which were used to generate it, while there is 
no guarantee on the accuracy of the NLROM outside the training data. For this rea- 
son, an NLROM based on POD are used in parametric contexts, i.e. when the HFM is 
parametrized with respect to material, geometric and load properties. In these cases, 
POD bases are obtained for different instances of the parameter set. Then paramet- 
ric NLROMs are formed by either building a larger POD basis that spans the solution 
within the parameters range of interest [5], or by interpolating local NLROMs online 
for the desired value of the parameters which were not previously sampled [3]. In this 
framework, the high computational cost associated to the HFM solution required by 
the POD are amortized by the large number of queries to the NROM. For further dis- 
cussion of this matter, we refer to Section 4.6. 


4.5.5 Non-intrusive methods 


As already mentioned, non-intrusive model order reduction approaches are beneficial 
when a HFM model is developed by means of a FE model with no access to element 
forces and Jacobians required for the analytical computations of reduced nonlinear 
internal forces. There are various strategies to non-intrusively develop a nonlinear re- 
duced order model. We outline here the most common methods. 
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Figure 4.5: Example of a POD applied to a nonlinear structural dynamics case. A flat plate is simply 
supported on three sides, and loaded with step pressure, as shown. The HFM linear and nonlinear 
response the mid node of the free side of the plate are shown. The marked difference of the two 
responses highlight the significance of the nonlinear term. The first three left and right singular 
vectors, and the corresponding singular values are shown on the bottom of the figure. Note the re- 
semblance of the dominant first left vector l, and its corresponding time evolution r4, with a random 
snapshot of the HFM (top left corner). 


4.5.5.1 Enforced displacement 


One way to identify the nonlinear stiffness coefficients of a reduced order model is 
to prescribe a set of nonlinear static displacements to the HFM and solve for the cor- 
responding reaction forces that generate them. This method, which is known as En- 
forced Displacement (ED) or STiffness Evaluation Procedure (STEP), was first developed 
by Muravyov etal. [48] and later extended by Kim etal. [38] for the case of unknown 
linear stiffness components. Essentially, ED first collects the reaction forces due to 
statically enforced displacements, projects them onto the modal space, and set linear 
equations to obtain the nonlinear stiffness coefficients K a and KO), ij,l,p =1,...,m. 

Let us consider the reduced nonlinear restoring forces equation (4.40) already in- 
troduced in Section 4.4.1.3, which reads 


f= KPa; + Ke qa + Kidd dp» (4.72) 
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with i = 1,2,...,m, where summation over repeated indices is implied. The second- 
order tensor KP can be immediately found to be 


k® = vixv,, (4.73) 


where K is the stiffness matrix of the HFM, usually obtainable from any FE program. 
Clearly, ki = Kee 

The ED method sequentially set linear systems of equation whose unknowns are 
K® and K®). The first step consists of choosing a modal amplitude +q,, r = 1,...,m 


ijl ijlp° 
and assign it to the reduced static problem as 


7 T) 

Kg, + Kg ROG = p,”’, (4.74a) 

aK ge ROR KO Gg =p”, (4.74) 
= (T,) 


with i = 1,...,m. The right-hand sides p; * and pe are given by 


p% =V tu), (4.75) 
p% =vi gu”), (4.76) 
where u”*) = V,q, and u" = -V,q,. In words, u™™ and u“? are imposed statically to 


the HFM, and the resulting reaction forces are projected onto V;. 
Likewise, we can select pairs of modal amplitudes (q,, 4s), (-q,,—-q,) and (q,; —qs) 
and impose it to the reduced static problem as 


5 5 > 2) 
KPa, + Kg: + Kg q, + Kg, +R a 


Iss 


+ ROG + ara, + Kaa? + ROG? =p? (4.77a) 
— Kg, — Kg. + Kg? + KO qd. + ROG 
- KO q? -KO aas -RO arai- Ka = BY” (4.77b) 
Ko Gy - KY q; + Kenge q,- KO GIs + RG q; 
+ inde ~ Kindrds + Kags -RG = B, (4.70) 


where we restrict the case to s > r. The right-hand sides of (4.77a)-(4.77c) are given by 


po = Vj fu"), (4.78 a) 
pe z vi eu’), (4.78b) 
pe _ vi eu’), (4.78c) 


where, analogously to (4.75), we statically impose to the HFM u=} = V,q, + Veqss 
uw") = -V,q, - V,q; and uw") = V,q, - V,q;. Lastly, we need to compute K® 


irsp? 
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r + s + p. Similarly to the previous steps, we determine a triple (q,,q,, ue and insert it 
to the reduced static problem. This results in a single equation for K Oa E 


KPa, H Kas + KO dp a Kia; rir a Kisar + Kino 

+ Rio dras + Kin Wp + Rispasap + Rimar + Kies Ts + Kipp Ip 

+ Ride + Kiddy + Rested, Kip + Key + Ke Ap 

E KO 4, 464p = peP, (4.79) 
with p > s > r, and 


DEP = Vi EuP), (4.80) 


where uP) = V,q, + Veqg + V p4p. Note that the restriction s > r in (4.77a)-(4.77c) and 
p > s > rin (4.79) is in fact setting Kee OVs <rand KO, = =0 Yp < s < r. This is done 
to minimize the number of coefficients multiplying the same combinations of g,q, and 


9 IsIp> Yr. S,p = 1,...,m. Alternatively, one could set symmetry properties as 


(2) _ g2) 
Kio = Ky (4.81) 
and 
(3) (3) (3) (3) (3) (3) 
KE, = Kia = Kip = Kig = Koa = Kip (4.82) 


and modify (4.77a)-(4.77c) and (4.79) accordingly. The total number Ngp of the re- 
quired nonlinear static evaluations to fully construct the nonlinear reduced restoring 
forces is 


Ngp = 2m F 3 mC2 H mC3> (4.83) 
where 
m! 
C.=_ 4.84 
mT (m-r)!r! Gey) 


Since this number is O(m?), the computational cost associated to ED grows quickly 
with the number of retained modes and severely hinders the applicability of the 
method. 


4.5.5.2 Enhanced enforced displacement 


To reduce the computational costs of the ED method, Perez et al. [52] proposed a way to 
identify the NLROM coefficients in the case that the tangent stiffness matrix is released 
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by the FE program at hand. We refer to this method as Enhanced Enforced Displace- 
ments (EED). The tangent stiffness matrix K‘(u) is the Jacobian of the nonlinear forces 
f with respect to the nodal displacements, as 


K‘(u) = (4.85) 
u 


Let us assume that K‘ (u) is available. Then the reduced tangent stiffness matrix K‘ can 
be obtained from K‘ as 


K‘ = v’K'(Vq)V, (4.86) 
whose ir component is given by 

ot _ y Trt 

Kir = Vi K (Vq)V,, (4.87) 


and we pose u = Vq. The reduced tangent stiffness matrix K‘ is the Jacobian of the 
reduced nonlinear internal force vector, f, namely 


òq, ddm 

CECE 2. a - 2h (4.88) 
Fm ae arn 
ogy ddm 


Then, following from (4.40), K;, reads 


a of : : 
t (2) (3) 
Ki = aq, Iki +K gjq + Kin Gp | 
(1) (2) (3) (3) (3) 
=k, are i Ky la la; + [Kin ij Rin E Kilda (4.89) 


The EED method equates (4.89) and (4.87) for as many combinations of modal am- 
plitudes as those required to solve for the reduced stiffness coefficients Ke n and KO g 
i j,r,l = 1,...,m. Similarly to the ED method, first two static displacenient vectors for 
each column of the reduction basis are constructed and inserted in (4.86). We can then 


set two equations as 


(1) (2) (2) (3) , (3) (3) ,(@))2 = 
RE” = Ko + [Re + Ki? lai? + [Ke RO RONOV, a=1,2. (4.90) 


Since the elements ki and k, were assumed to be zero unless p > l > j, (4.90) splits 
into three different cases, namely 


a = RM + ROGO + KROQ), ifj<r, (4.91a) 


= KO + ROGO + 3KO(q), iff =r, (4.91b) 
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Ki = Ky + KO qi + KO) (a! Jj? >, ifj>r. (4.91c) 


The solutions of (4.90) and (4.91a)-(4.91c) yield the coefficients ŘS 2) RO, RKO, and 


ijr ? ~*ijjr? iim? 2 
Ki ,j.r=1,...,m. At this point, the only set of coefficients to be identified is K tit pit 
lr, j,l,r = 1, ...,m. To calculate these coefficients one can use displacement vectors 


which are formed from combinations of two different columns of V by 
u= V;qj + Vq, jl =1,...,m, (4.92) 


where qj and q; are prescribed. These displacements are again inserted into (4.86) to 
obtain their corresponding tangent stiffness matrices. The irth component of the as- 
sociated reduced tangent stiffness matrix is then given by 

et _ ga) 2) 2) 3) 3) 

Ki, = KY + (Ria; + Kip ail + (Rj ain Ra + KGa]. (4.93) 
The only unknown in (4.93) is Ke , which can be found without needing combinations 
of three columns of V as for the case of the ED method. As a result, EED is computa- 


tionally more efficient than ED. In fact, the total number of required static solutions 
Neggp is then 


Neggp = 2M + mCo, (4.94) 


which is O(m°), as opposed to ©(m?) for the ED method. Note that the amplitudes 
qi, i=1,...,mare user defined. Typically, they are selected such that the resulting dis- 
placements generate nonlinear forces of the desired magnitude. For shell-like struc- 
tures, it is usually suggested to select q;, i = 1,...,m in the order of one thickness of 
the structure for transverse-dominated modes and 0.1 to 0.01 of that for membrane- 
dominated ones. A flowchart of the ED and EED method is shown in Figure 4.6. 


4.5.5.3 Implicit condensation 


In order to identify the coefficients of (4.40) via either the ED or the EED method, 
the reduction basis V must contain vibration modes and corresponding membrane- 
dominated vectors (e.g. MDs, DMs or manually selected high-frequency VMs). This 
increases the size of the basis significantly and impacts the efficiency of the method. 
However, it is possible to avoid the use of membrane-dominated modes by using the 
method of Implicit Condensation (IC) or Applied Force method, developed by McEwan 
etal. [45, 46]. Let p® be a generic load obtained by combining one, two and three 
columns of V with chosen scaling factors ĝ;, ĝj, qx as 


p® = K(V,(+q,) + Vj4q)) + Vi(44,)), ijik=0,1,2...,m, i+j+k, (4.95) 
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Figure 4.6: Flowchart of comparison between ED and EED steps to identify nonlinear stiffness coeffi- 
cients. The EED method manages to identify all the nonlinear stiffness coefficients in two steps only, 
because of the availability of the full order tangent stiffness matrix [34]. 


where the index 0 implies that the corresponding term is ignored, i.e. 
K(Voĝo + Vidi + V282) = K(V1q; + V2q2). 
One can show that the total number of cases Njc that can be generated from (4.95) is 
Nic = 2m + 4mCo + 8mC3» (4.96) 


and therefore s = 1,...,Njc. The scaling factors q;, i = 1,...,m should be chosen such 
that the resulting displacements are in the nonlinear regime. Gordon and Hollkamp 
[20] propose that a scaling factor q; for (4.95) should generate a force that induces a 
desired maximum displacement W, aS 


qi = =w, (4.97) 


a 
where V;_ denotes the component of V; mode which has the maximum out-of-plane 
displacement, and w; is the eigenfrequency of mode V;. The maximum desired dis- 
placement W; „> is usually chosen in the order of the thickness of the structure to 
properly exercise the nonlinearity. 
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Each load p“) is applied to the static HFM as 
flu) = p®. (4.98) 


The resulting displacement u“ is then expressed in the subspace spanned by V 
through the modal amplitudes q‘ as 


u® = Va” +a > q® = V'Mu®, (4.99) 


where û® is the remainder and it is assumed that V'Ma = 0. Likewise, p“ is pro- 
jected on V to give the modal loads p“ as 


p® =v'p. (4.100) 
Then each pair (®©, q‘*)) is inserted in the static reduced order problem to obtain 
RB a(S) a(S), je) p(s) 2(8) (Ss) _ a (s) 2 (9) — 
Ki q q? + Kpt q Ip =P; -wid i=1,2...,m, (4.101) 


where it is further assumed that the symmetry properties (4.81) and (4.82) hold. All 
equations resulting from (4.101) can be written in matrix form as 


GuKn = Pro (4.102) 
where 
(airy? gy 
(1) (1) (Nic) „(Nic) 
q! q; q; Ic a; Ic 
r_| a? Gey 
nl ~ (1))3 (Nic) 3 > (4.103) 
(qi) ae (qi 4) 
Nic) (Nic) (Nj 
a a. q og gl ic) 
N, 
(qP? . (qv) 
je (2) je (2) (2) je (3) (3) je (3) 
Kii Kij Kimm Kin Ki Se Kimmm 
ma} io of o ff: tt je 104) 
k2 k2 kO kO RO RO 
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and 
z ~ (N; N 
BY =a au BN ~ aq"? 
Pu = : oe : . (4.105) 
P ~ (Nic) Nic) 
po - wg” ae po ic) _ wg ic) 


The system of equations (4.102) can be solved by means of any regression technique, 
for instance the least square method. 

The main advantage of the IC method is the fact that the resulting NROM is based 
on vibration modes only, and thus is very compact as compared to the ED and EED 
method, which require also modes (e. g. DMs or MDs,) that represent the membranal 
behavior due to the large displacements. In the case of ED and EED is then possible 
to directly retrieve a full displacement field, and from it compute strain and stresses. 
This feature is crucial when the final aim of the analysis is to perform stress/strain 
investigations (for instance for fatigue life estimation). On the contrary, the IC method 
does not offer this possibility, as membranal behavior is not present in the basis. 

To overcome this issue, Hollkamp and Gordon [24] developed a post-processing 
expansion procedure, which approximate the response of these DOFs without increas- 
ing the number of required static solutions in IC. In this method, the total displace- 
ment vector u® relative to load case s can be written as 


9 = vq® +a, (4.106) 


where u is the solution of (4.98), and a is the correction vector for in-plane DOFs, 
to be found by the expansion procedure. We can express a in a modal fashion as 


a’ = vq’ (4.107) 


where V is the transformation matrix and ĝ% is the vector of modal membrane coor- 
dinates. The whole displacement set for s = 1,..., Nic can be compactly written as 


U=VvQ+ VO, (4.108) 


where U = [u,..., wu], Q = [q™,..., q] while V and Q are yet to be determined. 
It is assumed that V’MV = 0, that is to say, the in-plane displacement to be retrieved 
lies in a subspace which is M-orthogonal to V. Therefore, 


Q = V'MU. (4.109) 
It is then assumed that each column q of Q is given by 
GSP Ya ae Ga er E (4.110) 
see [49] for more details. Then, by virtue of (4.110) and (4.109), we finally obtain 
V = (I- VV’ M)UQ’, (4.111) 


m mee) 


where Q¢ is the pseudo-inverse of Ô and Ù € R™ 2 
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In practice, the NLROM obtained by IC is integrated in time and the vectors of 
modal coordinates q at time samples t4, t,,...,t,, are collected in Qy as 


Qr = [Q(ty),.--5Q(tn,)]- (4.112) 


The total displacement Ur = [u(t,),..., u(t, )] is retrieved by 
Ur = VQ; + var, (4.113) 
where the column 4(t;) of Q; relative to time step t; is given by 
At) = [aV aD) -o GDG- (mlt). (4.114) 


A comparison of the number of evaluations required by ED, EED and IC is shown in 
Figure 4.7. 
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Figure 4.7: Number of required nonlinear static solution for ED, EED and IC versus the number of 
modes in the reduction basis [34]. 


4.5.6 Nonlinear normal modes 


As briefly discussed, the presence of nonlinearity may alter the stiffness of the system 
(for geometric nonlinearities), the damping forces (i. e. with friction), or both. In fact, 
it was shown that linear reduction techniques based on vibration modes only are not 
sufficient to capture the nonlinear behavior, and therefore other vectors (e. g. MDs or 
DMs) have to be added to the reduction basis. One could also wonder how a certain 
eigensolution (i.e. a vibration mode oscillating at its own eigenfrequency) changes 
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when the displacements trigger the nonlinearity. For conservative autonomous sys- 
tems (i. e. systems with no damping and no external excitation), there still exist peri- 
odic solutions, usually called Nonlinear Normal Modes (NNMs), which can be thought 
of as the nonlinear equivalent of a vibration mode for a linear system. This concept 
was first defined by Rosenberg [55] as “any vibration-in-unison of a conservative non- 
linear system, i.e. where the coordinates of the system pass through the equilibrium 
and reach their extrema simultaneously”, and later generalized to include any non- 
necessarily synchronous periodic motion [37]. The latter definition allows the system 
to exhibit internal resonance, i. e. the coexistence of two or more modes which feature 
periodic motions at periods at commensurate ratios. Pioneering work on NNMs was 
done by Shaw and Pierre [59, 60]. 

NNMs are instrumental in assessing the impact of the nonlinearity on the dynamic 
response. In [40], NNMs are proposed as tool to assess the quality of a NLROM. In this 
work, it is in fact suggested to directly compare NNMs computed with NLROMs, as the 
NNMs of the HFM are too costly to be computed for large systems. Usually, the effect 
of a NNM is summarized by means of a Frequency—Energy Plot (FEP), where the fre- 
quency of the periodic motion is plotted against the energy associated to the motion 
itself, for instance resulting from initial conditions. Note that, in such representation, 
the actual shape of the motion throughout the period is hidden. In Figure 4.8 (taken 
from [62]), the NNM associated to the first vibration mode of a shallow arch pinned at 
both ends is shown. The response stays linear (i. e. of constant frequency) for a wide 
range of energy. Then the frequency decreases due to a softening’ behavior, followed 
by an internal resonance tongue due to the interaction with the NNM associated to the 
third vibration mode. At this point, in fact, the two NNMs evolve with a period ratio 
of 4:1, enabling the existence of a periodic solution at which both contributions can 
coexist. The motion triggers then a hardening behavior at higher energy. A FEP as the 
one shown in Figure 4.8 is computationally expensive to construct. In fact, every point 
of the FEP requires the knowledge of a periodic motion for a given energy level. The 
frequency and the initial conditions of such motion are not known a priori. Numeri- 
cal techniques for solving this problem are discussed in detail in [51] and [28]. Here, 
is it worth to highlight the benefits of resorting to NLROMs when computing NNMs. 
In [62], for instance, MDs-based ROMs were assessed for the computation of NNMs of 
FE-discretized planar beam structures. In this work, the shooting method was used 
to obtain periodic solutions of the autonomous conservative systems of interest. This 
method essentially integrates the system in time for trial initial conditions (i. e. initial 
deformed shape imposed to the structure) and a trial period, and then correct them via 


1 The term “softening” (“hardening”) usually refers to a decrease (increase) of the slope of the restor- 
ing nonlinear forces. For instance, a flat beam, cantilevered at both ends and loaded by a transverse 
force at its mid-span features a hardening stiffness: the bending associated to finite displacements in- 
duces stretching in the cross-section, which in turn generates an additional restoring bending moment 
to that due to linear effects. 


128 — P.Tisoetal. 


ef us (87 DOF) 


500 | ===- GP +, (1 DOF) y m Jis f: 
«= GP #, (2 DOF) L 4 ! 
490} — — GP ¥, (SDOF) a: H ! 
-4 ? i i 
48 li (c) 
: 


Frequency [rad/s] 


> > 
È G 
ò 


> 
[2] 
ts] 


(a) (b) (c) (d) 


420 — - - 
107 10° 10° 10° 10° 10° 10° 
Energy [J] 


Figure 4.8: First NNM of a shallow arch. Note that the NNM first exhibits softening (i. e. decrease in 
frequency due to a decrease in stiffness), followed by an internal resonance (4:1 interaction with 
the third NNM), and then hardening. The FEP computed for the full model (blue solid line), a ROM 
containing V = [@,], a ROM containing V = [@, 3], and a ROM including V = [, P; 01, 013 433]. 
Plots (a) to (d) show the spatial shape of the response (solid line) for the corresponding points on 
the FEP. The undeformed configuration is shown with dashed line [62]. 


Newton-—Raphson iterations until a periodicity condition is met with pre-defined ac- 
curacy. Once a solution is obtained (i. e. a point of the FEP), continuation is employed 
to estimate the new guess at higher energy level. It becomes clear then that each point 
of an FEP requires multiple time integration of the system, and therefore, the tracing 
of a FEP for each mode of interest can be a daunting task. In [62], it was demonstrated 
that MD-based NLROMs capture well the nonlinear behavior and internal resonances 
by only including the vibration modes—and corresponding MDs, which are expected 
to interact. This is achieved at a smaller computational cost as the one required for 
the HFM. Moreover, a NLROM so constructed filters a physically meaningless contri- 
bution of high-frequency dynamics, which is only due to the resolution of the spatial 
discretization. In doing so, the convergence of numerical implicit time integration is 
significantly improved. 


4.5.7 Nonlinear manifolds for reduction 


Up to now, we discussed techniques that approximate the HFM solution as a linear 
combination of modes spanning a low-dimensional subspace. This is not the only pos- 
sibility, as many dynamical systems evolute on manifolds, rather than subspaces [44]. 
This is typically the case for systems characterized by slow and fast dynamics. An ex- 
ample of this is static condensation (see Section 4.5.1), where the fast dynamics of the 
in-plane motion is statically enslaved to the slow dynamics of the bending deforma- 
tions. The resulting mapping between fast and slow dynamics is in this case quadratic, 
see equation (4.51). Once a condensed system is obtained through slow—fast decom- 
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position, a further reduction is of course possible. One could apply the techniques dis- 
cussed in Sections 4.5.2, 4.5.3 and 4.5.4 (i. e. based on linear subspaces for reduction) 
or seek for a manifold on which the solution is embedded. Such a manifold passes 
through the equilibrium point and is there tangent to the corresponding linear mode. 
Recently, the concept of Spectral Submanifold (SSM) was introduced by Haller and 
Ponsioen in [21]. The SSM was introduced as an invariant manifold asymptotic to a 
NNM, and it is the smoothest nonlinear continuation of a modal subspace emanating 
from the equilibrium. A discussion on the potential of SSM for model reduction can be 
found in [53], where SSMs are computed for a linear FE-discretized beam with a cubic 
nonlinear spring acting at the tip. An interesting application of slow—fast decomposi- 
tion combined with SSM reduction is presented in [30] for the case of a geometrically 
nonlinear FE discretized beam. 

An approximate method to construct a manifold for reduction, related to the MDs 
method, is presented in [31] and then generalized in [58]. There, it is postulated that 
the HFM displacements u(t) are mapped to the modal coordinates q(t), i.e. q € R” > 
T(q) € R” as 


u(t) = I'(q(t)) := ® - q(t) + zo -q(t)- q(t), (4.115) 


where ® € R™™ contains selected vibration modes, and Q € R™”””" is a third-order 
tensor constructed with the MDs stemming from ®. The resulting manifold is therefore 
quadratic in q, hence the naming Quadratic Manifold (QM). The concept is illustrated 
in Figure 4.9, for the case of a cantilevered plate. Instead of constituting additional 
DOFs for the NLROM, the MDs provide the curvature of the manifold. The NLROM is 
then obtained by inserting (4.115) into (4.30) and projecting onto the tangent manifold 
P;(q), obtained: 


P,(q) = TO. (4.116) 


oq 
Again, the interested reader is referred to [31] for details. The method was tested on 
a fairly large FE model of a wing subject to representative gust load. This approach 
outperformed a POD reduction with the same number of modes (5 vibration modes 
vs 5 POD modes), in terms of accuracy. It is worth mentioning that the QM technique 
needs to be equipped with a compression technique of the resulting nonlinear reduced 
terms. An exact tensorial form as outlined in Section 4.4.1.2 is cumbersome, as the re- 
sulting nonlinear terms would contain tensors up to seventh order. Likely, the higher 
order terms are not relevant for accuracy, and yet require the most intensive compu- 
tational and storage effort. A possible strategy could be therefore to neglect some of 
the highest terms. Another possibility is hyper-reduction. It was shown in [32] that 
ECSW can be extended to the case of nonlinear manifolds for reduction. In this work, 
the same example tackled in [31] is considered, and better speed-ups than traditional, 
linear basis POD reduction are obtained, as less elements are picked by the algorithm. 
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Figure 4.9: Illustrative picture of a quadratic manifold for reduction of a cantilevered plate. The man- 
ifold is a function of the amplitudes of the first two vibration modes. The three MDs provide the cur- 
vature of the manifold. 


Arguably, this is due to the fact that the QM more compactly represents the physics of 
the problem than the POD basis. 

Still, if hyper-reduction is the way to go for reduction with nonlinear manifolds, 
one would desire to form the training sets without recurring to expensive HFM simu- 
lations. Attempts in this sense are made by [57] and [29]. In the latter of these contri- 
butions, the QM is in fact used to “lift” inexpensive linear modal solutions obtained 
as described in Section 4.2. The obtained displacements are then used to generate 
nonlinear forces to train the ECSW selection. The method proved very effective for 
geometrically nonlinear problems, for which the QM provides a good description of 
the physical behavior. The NLROM, though, is still obtained by projection on linear 
subspaces containing vibration modes and modal derivatives. The extension of this 
technique to QM-based reduction is yet to be tried, but it is foreseen not to pose major 
hurdles. 


4.6 Overview on parametric nonlinear ROMs 


In the previous sections, projection-based NLROMs strategies to select a suitable ba- 
sis and to evaluate the nonlinear system of equations have been discussed. As it has 
already been pointed out, most of the showcased methods do provide very high speed- 
ups in the online phase, but usually they require some kind of overhead costs involv- 
ing offline training and/or pre-computations. This often heavily reduces the effective 
gain provided by the ROM. Indeed, calling for convenience f,,, and t, the online and 
offline times of a ROM and tẹ the time of a full order analysis, the effective speed-up 
factor can be computed as SP = (toy + foge)/ty. It is clear that, if the same reduced model 
is used for multiple (say M) queries requiring different analyses, this factor increases 
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to SPy = (Mton + tope)/(Mt,). In other words, the offline costs become less important 
as the number of evaluations increases. Parametric ROMs (PROMs) are then a natural 
extension of this concept, as the number M would include the sum of the number of 
queries for each combination of parameters, making the reduction even more prof- 
itable. 

For linear systems parametric reduced order modeling (PMOR) techniques are 
well developed. Loosely speaking, the main challenges in PMOR reside in (i) the con- 
struction of the model itself, (ii) identification of a projection basis that can be valid 
over a space of parameters, and (iii) selection of an interpolation and sampling strate- 
gies. The model can be constructed following a local approach (e. g. building several 
ROMs for different parameters and interpolating the bases/ROM/solutions) or a global 
approach, where the full order system is parametrized and a single basis is selected. 
For the latter case, again many options are available. Moment-matching (discussed ex- 
tensively in [7, Chapter 3]) is a popular strategy, where the basis is constructed requir- 
ing the nth derivatives (moments) of the reduced and full order systems with respect 
to the parameters to match [17]. Another approach is to construct the basis repeatedly 
applying POD to a set of simulations selected through some kind of parameter-space 
sampling procedures (e. g. Latin Hyper-cube, Smolyak sparse grid). For an extensive 
survey of existing methods the reader is referred to [6]. 

On the other hand, projection-based PMORs for nonlinear systems feature a much 
less mature theory, and constitute a broad and rapidly evolving area of research. Many 
of the methods available in the literature try to carry over concepts from linear analysis 
to the nonlinear case. In this sense, POD represents the most widely exploited method 
due to its versatility and properties, such as error bound and optimality of the reduc- 
tion basis it provides. In [4] for instance, a PROM was developed for contact problems 
by performing a number of full order simulations sampling the parameter space, then 
taking the POD of each and clustering all the resulting bases. Using a computation- 
ally cheap error estimator, this process is repeated in order to refine the approxima- 
tion. A lot of other strategies involving interpolation of such full order simulations 
and/or POD bases also exist. In [70] the data coming from the full order simulations 
is manipulated using a two-level radial basis function interpolation method to obtain 
hyper-surfaces in the parameters, while in [22] the interpolation step is carried out us- 
ing neural networks to compute the interpolation coefficients. All the aforementioned 
strategies belong to the family of (non-intrusive) data-driven methods, and they all 
demand a huge upfront cost to be paid to develop the ROM itself. Nevertheless, these 
approaches might probably be the only viable ways to deal with complex problems 
such as the ones in fluid-mechanics, which often provides the benchmarks for these 
algorithms. 

In the context of geometric nonlinearities in solid mechanics, however, we 
showed that under some circumstances the nonlinear internal forces take specific 
shapes (e. g. third-order polynomials), and thus some information from the model 
can be exploited for the construction of the reduction basis (e. g. MDs) without the 
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need of full simulations. In [43] for instance, such information is used to construct 
a PROM to describe shape defects of a structure in a parametric fashion and relying 
solely on the model of the system. Even if probably model-driven approaches seem to 
go against the trend of a consistent share of the scientific community that gradually, 
but steadily, moves towards the machine-learning area, we deem such strategies still 
worth of investigation not only for the possible advantages over other techniques, but 
also for their engineering value in terms of interpretability and ease of use. 

The parametrization could also be intended as time-dependent, as done for in- 
stance in [9] and [63] for the reduction of models of meshing gears. Here, a reduction 
basis which is used to reproduce the contact behavior between gears teeth is made 
function of a time-dependent load position parameter, and appended to a constant 
basis that spans the global behavior of the system. The local, parametrized modes can 
be either residual attachment modes [63], or static solutions at specific configuration 
[9]. The time variation of the reduction basis is then generating additional, state de- 
pendent terms in the ROM, which are shown to be important for stress recovery. In all 
this contributions, the time variation of the parameter is known a priori. Likewise, a 
prescribed time-dependent law could parameterize the equilibrium of a given system. 
This is the case of thermo-mechanical systems, where the temperature field is typically 
determined buy solving the heat equation, and then applied to the mechanical prob- 
lem. Typically, there is a large characteristic time scale difference between the thermal 
and mechanical problem, being the latter characterized by much faster dynamics. In 
this case indeed, the slow variation of the applied thermal field justifies a parametric 
equilibrium and a parametric reduction basis, which can be conveniently interpolated 
online; see [33]. For the same reason, state dependent terms that are otherwise present 
(see again [9, 63]) can be neglected. 


4.7 Conclusions 


We briefly overviewed the most popular methods for model order reduction of linear 
and geometrical nonlinear mechanical systems. The approaches here considered are 
system-driven, meaning with this that the reduced order model is built from quantities 
that can be derived directly from the discretized equations governing the high fidelity 
model. This is in contrast with data-driven techniques, which in fact build the reduced 
order model from solutions of the high fidelity model. The main appeal of system- 
driven method is their independence from computationally expensive solutions and 
their exploitation of the system characteristics. 

In structural dynamics applications, the modal displacement method has become 
a standard already for decades. The full order solution is approximated by a combi- 
nation of a few carefully selected eigenmodes, which are also used to span a space 
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where the system is projected. The resulting reduced order model consists of an un- 
coupled system of modal equations which can be easily solved. Enhancements to the 
modal displacement method aim at retrieving the contribution of discarded vectors 
by quasi-static corrections. 

The same strategy, namely modal expansion and projection, can be employed also 
for nonlinear systems. Here, we restricted to systems with geometric nonlinearities, 
which are typically modeled by smooth nonlinear functions of displacements. Sys- 
tems affected by geometric nonlinearities usually feature a slow dominant behavior, 
which is complemented by a fast dynamics. The slow dynamics—typically featuring 
out-of-plane motion for the case of thin-walled structures—can still be developed by a 
set of dominant vibration modes selected by the modal displacement method. The fast 
dynamics is usually enslaved to the slow one, as it is given by the bending-stretching 
coupling due to the geometric nonlinearity. In fact, for flat systems where the in-plane 
and out-of-plane dynamics can be clearly separated, a nonlinear mapping between 
out-of-plane (slow) and in-plane (fast) dynamics- usually referred to as static conden- 
sation can be established by neglecting the in-plane inertial terms. 

For more geometrically complex systems, this separation of the degrees of free- 
dom might be hard or impossible to perform. In this case, one can of course rely on 
expansion and projection onto a subspace that is able to reproduce the nonlinear cou- 
pling effects. Two methods to form such basis gained popularity over the past two 
decades, namely modal derivatives and dual modes. Both methods find the most rele- 
vant in-plane contribution to best span the nonlinear behavior. While the dual modes 
method is inherently non-intrusive, as it requires several nonlinear static solutions of 
the high fidelity model, the modal derivative strategy can be applied also intrusively 
if the directional derivative of the tangent stiffness matrix along the dominant linear 
modes could be calculated. 

Indeed, the wording non-intrusive and intrusive refer to two distinct classes of re- 
duction methods. The distinction lies on whether or not it is possible to access elemen- 
tal functions to compute the projected reduced order terms. In non-intrusive methods, 
the structure of these terms is first postulated and then identified, typically by prob- 
ing the system with sufficient nonlinear static cases, where either displacements or 
forces are imposed on the high fidelity model. We discussed the methods of enforced 
displacements and enhanced enforced displacements, which identify the coefficients 
of the projected nonlinear forces by statically applying displacements determined by 
combinations of the modes used for reduction. The number of such evaluations is 
O(m?) and O(m?) for the enforced displacement and the enhanced enforced displace- 
ment method, respectively. The latter gain is due to the assumed availability of the 
tangent stiffness matrix from the high fidelity model. The method of implicit conden- 
sation and expansion differs from the former two as it relies only on dominant vibration 
modes—and not on either dual modes or modal derivatives—to construct the reduced 
order model. A recovery of the full displacement field is obtained through an expan- 
sion step that assumes a quadratic dependency of the in-plane modes to the dominant 
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vibration modes. In intrusive methods, on the contrary, the reduced nonlinear forces 
are computed exactly by evaluating and projecting at element level. This is efficient if 
the resulting reduced terms are functions of modal coordinates only—i. e. no access to 
the mesh is required any longer, once the reduced order model is constructed. This is 
the case indeed of polynomial terms, resulting from lagrangian formulation and elas- 
tic material of continuum elements, or plates modeled with the von Karman kinematic 
assumptions. In more complex cases, the evaluation at element level is unavoidable, 
and the speed-up associated to a reduced order model drops significantly. In these 
cases, one can resort to hyper-reduction, which is extensively treated in [8, Chapter 5]. 
Essentially, hyper-reduction scales the computation of the reduced terms with the size 
of the reduction basis and not with the one of the high fidelity model, and thus deliv- 
ers huge speed-ups. Here, we outlined the Discrete Empirical Interpolation Method 
(DEIM) and the Energy Conserving Sampling ans Weighting (ECSW) method. Both 
methods rely on training forces, which typically come from the Proper Orthogonal De- 
composition of high fidelity solutions. As such, they could be classified as data-driven 
methods and would therefore not belong to this chapter. However, recent trends tar- 
get the construction of such training sets without recurring to expensive full solution. 
For instance, in the case of geometric nonlinearities, one can lift computationally lin- 
ear modal solutions on a quadratic manifold centered around the equilibrium, and 
whose curvature is provided by modal derivatives. The resulting displacements gen- 
erate forces which can be used for training of the hyper-reduction. 

The slow-fast dynamics dichotomy of the behavior of nonlinear systems in fact fa- 
vors reduction methods based on nonlinear manifolds, rather than projection on lin- 
ear subspaces. In other words, the solution is assumed to evolve on curved, rather that 
flat, manifolds, which can be constructed via asymptotic expansions around the equi- 
librium point. Along this direction, the recently proposed spectral submanifold is de- 
fined as the smoothest nonlinear continuation of a modal subspace at the equilibrium 
point. Likewise, we briefly discussed the reduction on quadratic manifolds constructed 
with vibration modes and modal derivatives. As the manifold can be seen as a nonlin- 
ear constraint to the solution, a configuration-dependent mass matrix and convective 
generalized forces are generated by this reduction method. Reduction through non- 
linear manifolds exacerbates the issue of efficient computation of reduced nonlinear 
terms. In the case of polynomial nonlinearities for the high fidelity model, the result- 
ing reduced tensors are of higher order as the ones corresponding to linear projection. 
This could quickly exhaust memory resources and slow down the time integration of 
the reduced system significantly. Recent contributions are tackling this problem by 
applying for instance hyper-reduction. 

Lastly, we briefly discussed parametric model order reduction. In this context, the 
general trend is to rely on data-driven methods, usually based on POD, to construct 
the parametrized reduced order model. In general this could be done by either forming 
a large reduction basis that spans the parameter space of interest, or by constructing 
local reduced order models and interpolate across the parameter space. In either case, 
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a greedy procedure often drives the generation of the parametric reduced order model. 
However, for the case of geometric nonlinearity, it is possible to exploit the structure of 
the governing equations and devise a reduction basis containing sensitivities of modes 
with respect to parameters—for instance, in the case of shape imperfections. This pro- 
cedure is completely system-driven and it is shown to span the parameter space in the 
neighborhood of the nominal parameter values. 
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Stefano Grivet-Talocia and Luis Miguel Silveira 
5 Post-processing methods for passivity 
enforcement 


Abstract: Many physical systems are passive (or dissipative): they are unable to gen- 
erate energy on their own, but they can store energy in some form while exchanging 
power with the surrounding environment. This chapter describes the most prominent 
approaches for ensuring that Reduced Order Models are passive, so that their math- 
ematical representation satisfies an appropriate dissipativity condition. The main fo- 
cus is on Linear and Time-Invariant (LTI) systems in state-space form. Different condi- 
tions for testing passivity of a given LTI model are discussed, including Linear Matrix 
Inequalities (LMIs), Frequency-Domain Inequalities, and spectral conditions on asso- 
ciated Hamiltonian matrices. Then we describe common approaches for perturbing 
a given non-passive system to enforce its passivity. Various examples from electronic 
applications are used to demonstrate both theory and algorithm performance. 


Keywords: passivity, dissipativity, positive real lemma, bounded real lemma, Hamil- 
tonian matrices, state-space systems, descriptor systems, eigenvalue perturbation 


5.1 Introduction and motivations 


Let us consider the problem of designing a complete electronic product, such as a 
smartphone or a high-end computing server. The complexity of such a system is over- 
whelming: a single microprocessor might include several billions transistors, and this 
is just one component. All components are tightly interconnected to exchange signals 
and power: they interact both through electrical connections as well as (unwanted) 
electromagnetic couplings, which are inevitable due to the close proximity of compo- 
nents in tightly integrated systems. A proper design flow must ensure that all signals 
behave as expected during real operation, which requires accounting for all interac- 
tions between components and subsystems. A first-pass design can only be achieved 
by extensive numerical simulation at the system level, in order to verify full compli- 
ance with specifications. 

All of us would agree that a direct, brute-force simulation of the complete sys- 
tem is totally unrealistic. This is why common engineering practices partition a given 
complex system into several simpler subsystems, which are modeled independently. 
All models are then interconnected to obtain a system description that is amenable 
for numerical simulation. These individual models are very often obtained through 
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some Model Order Reduction scheme applied to some initial device-level characteri- 
zation. 

Suppose now that one of the above individual models represents some signal 
or power interconnect network. Such interconnect structure is intended to feed sig- 
nal and power supply to all elements of the system, in the form of electrical current 
flowing through metal wires. The interconnect network is unable to generate en- 
ergy on its own, but rather redistributes the energy that it receives from its input 
signals to its output signals. It may store energy through electric and magnetic field 
densities in the physical space surrounding the interconnect, and it may dissipate 
energy as heat through metal and dielectric losses, but no more energy can be sup- 
plied to the environment than the amount of energy previously stored. Such a system 
is called dissipative (or passive). The concept of dissipativity naturally arises from 
energy conservation principles and is therefore ubiquitous in several engineering 
fields. 

A (reduced order) model of the interconnect must respect such property: the sim- 
ulation model must not be able to release more energy than previously stored. This re- 
quirement is not just for self-consistency with fundamental physical principles, but for 
a very practical reason: a non-passive model may trigger instabilities during system- 
level simulation. An example is provided in Figure 5.1, which compares the voltage 
received by a state of the art (at the time of writing) smartphone through a high-speed 
interconnect, computed using a passive vs. non-passive model connected to various 
other system parts, including drivers and receiver circuits that send and receive the 
signals. The non-passive model triggers a resonance, by injecting a continuous flow 
of power that is responsible for the instability. Conversely, the passive model provides 
a well-behaved bounded response. 


Non-passive model Passive model 


5 5.5 6 6.5 7 7.5 8 8.5 9 9.5 10 
Time (ns) 


Figure 5.1: Comparison between the responses of passive (thick line) and non-passive (thin line) 
models of a high-speed smartphone interconnect link. 


A fundamental result states that the interconnection of passive subsystems leads to 
a passive system; see e. g. [74]. Therefore, a guarantee of passivity for all individual 
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models for which this requirement is adequate’ is also a guarantee that a model-based 
system-level simulation will run smoothly. This is a relevant problem not only for elec- 
tronic applications, but for several applied engineering fields such as, e. g., mechanics 
or fluid dynamics. Energy conservation or dissipation properties must be preserved in 
the simulation models. 

Figure 5.1 illustrates in a clear manner that any modeling procedure used for anal- 
ysis of the dynamics of dissipative physical systems should ensure that the resulting 
model or reduced order model is dissipative. There exist MOR algorithms that are able 
to preserve dissipativity if applied to an original large-scale dissipative model. Exam- 
ples are the PRIMA algorithm [63, 67] (see also [6, Chapter 4]) or the PR-TBR algo- 
rithm [19, 62, 64, 65, 68] (see also [5, Chapter 2]). Unfortunately, for a variety of reasons, 
possibly including efficiency considerations, such schemes are not always applicable, 
and one has to resort to one of the many reduced order modeling techniques that are 
not able to preserve or enforce dissipativity. Therefore, it is often necessary to perform 
a-posteriori checks and possibly implement a post-processing procedure that enforces 
model passivity. 

In this Chapter, we review the various forms in which the passivity conditions of a 
model can be stated. The particular class of systems that we focus on is defined in Sec- 
tion 5.2, although generalizations are discussed in Section 5.6. Different forms of pas- 
sivity conditions will lead to corresponding different numerical schemes for their veri- 
fication, discussed in Section 5.3. We then present in Section 5.5 a selection of methods 
for passivity enforcement, mainly cast as perturbation approaches that, starting from 
the original non-passive model, update its coefficients in order to achieve passivity. 
Model accuracy is retained by minimizing the perturbation amount in some norm, as 
discussed in Section 5.4. 

The style of this chapter is informal, with main results being stated with some 
essential derivation, but without a formal proof. Emphasis is on the practical aspects 
of the various formulations, which lead to algorithms presented in pseudocode form. 
Pointers to the relevant literature are provided for additional insight. 


5.2 Passivity conditions 


In order to keep this chapter self-contained, our discussion is based on the special 
class of Linear Time-Invariant (LTI), finite-dimensional systems in regular state-space 
form 


eS l x(t) = Ax(t) + Bu(t), (5.1) 


y(t) = Cx(t) + Du(t), 


1 Note that not all components are passive: for instance, signal or power sources or amplifier circuits 
do not and must not be expected to behave as passive elements. 
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where t denotes time, vectors u € U c R™ andy € Y c R™ collect the system inputs 
and outputs, respectively, and x € ¥ c R” is the state vector, with x denoting its time 
derivative. The M x M transfer function of the system is 


H(s) = C(sI - A) 'B + D, (5.2) 


where s is the Laplace variable. We start by assuming the system to be asymptotically 
stable, with all eigenvalues of A, i. e., the poles of H(s), having a strictly negative real 
part. 

The above assumptions may seem overly restrictive, but most common macro- 
modeling schemes that are widespread in electronic applications such, e. g. the Vector 
Fitting algorithm [5, Chapter 8], produce reduced order models in this form. General- 
izations will be discussed in Section 5.6. 


5.2.1 Dissipative systems 


The system S in (5.1) is dissipative [15, 74, 88] with respect to the supply function s : 
U x Y > Rif there exists a storage function V : X +> R such that 


t 


V(x(t,)) < V(x(to)) + f suo.yo) dt (5.3) 


to 


for all tọ < t; and all input, state and output signals u, x, y that satisfy the state-space 
equations (5.1). In the above definition, V represents the internal energy that the sys- 
tem is able to store, and s is the power flow exchanged by the system with the environ- 
ment. Thus, for a dissipative system the increase in the internal energy that the system 
undergoes during any time interval (to, t,) cannot exceed the cumulative amount of en- 
ergy received from the environment, expressed as a time integral of the input power 
flow. If the storage function is differentiable, the dissipation inequality (5.3) can also 
be cast in the equivalent form 


aV(x(t)) 


Is s(u(t), y(t)). (5.4) 


As a typical example, one may consider an electric RLC circuit made of an arbi- 
trary number of arbitrarily connected resistors Rọ > 0, inductors Lọ > 0 and capacitors 
Cx > 0, which interacts with the environment through M ports. Each port defines an 
input, e. g., the port voltage v,, with the port current i; acting as the corresponding out- 
put (this representation is called admittance). For this example, the state vector com- 
prises all capacitor voltages vç and inductor currents i,;, so that the energy storage 
function is defined as V := T k CeVey + Xp Luizy)- The electric power entering the cir- 


cuit at the jth port is v;i;, so that the power supply function reads s(u, y) = uly=> j Vili. 
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By Tellegen’s (power conservation) theorem [4, 24], we have 
R : E 2 
s(u,y) = Xvi = Y vaic + $ ViklLk + Y Rikk 
j k k k 


dv 2 dV 
=t > Reine er (5.5) 


where we used the definition of capacitor currents ick = Ck Max , inductor voltages vz% = 
1, and where ip, are the resistor currents. So, we see that any RLC circuit with 
positive elements is dissipative. 


The system S in (5.1) is called strictly dissipative [74, 88] with respect to the supply 
function s if the stronger condition 


ti t 


Vt) < V(x(to)) + [suo, y(t)) dt - è | luO? at 65.6) 


to to 


holds for some e > 0 instead of (5.3), which is thus satisfied with a strict inequality. 
The following three subsections provide different equivalent passivity conditions 
that are applicable to linear state-space systems in the form (5.1). 


5.2.1.1 Linear matrix inequalities 


Building on the above example, we consider for the general system (5.1) a quadratic” 
storage function V(x) = 5x" Px associated to a symmetric and positive definite matrix 
P =P! > 0. Also, we adopt the same supply function? 


1 
s(u,y) = uy = y'u = zuy + y'u). (5.7) 


Imposing the dissipation inequality in differential form (5.4) leads to 


x Px = 5 ("Px +x" Px) < (uy +y'u) (5.8) 


and using (5.1) to eliminate x and y, we obtain the following condition: 


TnT T 
x\' (ATP+PA PB-CT\ (x A 
0, P=P">0, ; 
@ (ao Ee (4) § F 62) 


2 It is well known [89] that, if a storage function V satisfying (5.3) for system (5.1) exists, it can be 
found as a positive definite quadratic form. 

3 In many physical systems power is expressed as the product of relevant variables, such as voltage- 
current in electrical circuits, pressure—flow in hydraulic systems, and force—velocity in mechanical 
systems. 
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which provides the passivity condition to the state-space system (5.1). This condition 
leads to the well-known Positive Real Lemma (PRL), which is a particular case of the 
Kalman-Yakubovich-Popov (KYP) lemma [51, 66, 92] (see also [2, 56, 74]), which states 
that the state-space system (5.1) is passive if and only if 


(5.10) 


T pe 
ap =P" so: Gace Pp pjat 


B'P-C -D-D! 


This class of conditions is generally known as Linear Matrix Inequalities (LMIs) [2, 13, 
14, 87]. For a strictly passive (dissipative) system (5.10) holds with a strict inequality. 


5.2.1.2 Frequency-domain inequalities 


An equivalent condition for passivity characterization is provided by a frequency- 
domain inequality. It is well known [2, 81, 90] that the transfer function of a general 
passive system must be Positive Real (PR), i.e., the following three conditions must 
hold: 

1. H(s) must be regular in the open right half complex plane R {s} > 0; 

2. H(s*) = H* (s), where * denotes the complex conjugate; 

3. W(s) = H(s) + H” (-s) > 0 for R{s} > 0. 


Condition 1 is directly related to the stability of H(s), which is here assumed a pri- 
ori; condition 2 implies that the impulse response of the system is real-valued; con- 
dition 3 completes passivity characterization through a Frequency-Domain Inequal- 
ity. The connection between the PR conditions and the PRL/KYP Lemma are well- 
developed and proved in [2]. 

Since by our assumption all the poles of H(s) are strictly stable, and since the 
adopted state-space realization is real-valued, both conditions 1 and 2 are automati- 
cally satisfied, whereas condition 3 can be restricted to the imaginary axis s = jw by 
the minimum principle of harmonic functions [69], showing that 


Ww) = Hw) + H” Gw)z20 VweR (5.11) 


where # denotes Hermitian transpose and jis the imaginary unit. Continuing on the 
same RLC circuit example above, the latter condition states that the input admittance 
(matrix) of the circuit block must be nonnegative (Hermitian) definite, which in the 
scalar case M = 1 reduces to the requirement that the real part of the input admittance 
or impedance of any passive one-port element must be nonnegative at any frequency. 
Condition (5.11) can be further conveniently rewritten as 


A; 20, VA; €A(¥Gw)), Vue R, (5.12) 
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where A(-) denotes the set of all eigenvalues of its matrix argument. Nonnegativity can 
thus be tested for all individual frequency-dependent eigenvalue trajectories A;(jw) of 
(jw), for 1 < i < M. Inequalities (5.11) and (5.12) are strict for w € RU {oo} in the case 
of strictly passive systems. 


5.2.1.3 Hamiltonian matrices 


There is a third class of conditions that can be used to characterize a passive system, 
based on the so-called Hamiltonian matrix associated to (5.1). We introduce this matrix 
by finding the set of spectral zeros of ¥(s). Let us assume that V(s,)v = O for some 
vector v + 0, with Sọ € C. Using (5.2) we have 


C(s I- A) 'B +D+ B! Sol = A’ er T D7 v=0. (5.13) 
0 
Let us define 


r = (Sol - A) ‘Bv > Sor =Ar+ Bv, 
TA-LAT T T (5.14) 
q=(-Sol-A’) Cv > Sq=-A'q-C'v. 


Substituting in (5.13) and solving for v under the assumption that Wọ = D+ D! is 
nonsingular (see Section 5.6 for a generalization) leads to 


v = -W,'(Cr+B'q), (5.15) 


which, inserted in (5.14), again leads to 


A-BW,'C -BW,'B" r r 
Tai TAT alt ( )= 50 ( ). (5.16) 
C'Wo'C = -A’ + C™Wo'B" / \@ q 
M 


The matrix M in (5.16) has (real) Hamiltonian structure, since 


JM)? =JM_ whereJ = ( : : ) l (5.17) 
It is easily shown that the corresponding eigenspectrum is symmetric with respect to 
both real and imaginary axis. Condition (5.16) states that the spectral zeros of ¥(s) 
are the eigenvalues of the Hamiltonian matrix M. If one of such eigenvalues is purely 
imaginary Sọ = JWọo, then we may have a violation of the frequency-domain inequal- 
ity (5.12). In fact, if (5.16) holds for some purely imaginary Sọ = jwo, then Y(jw) is 
singular at wo, implying that one of its eigenvalues A;(jw) vanishes at wo. If this zero 
is simple, then the eigenvalue trajectory A;(jw) changes sign at wo, thus violating the 
passivity condition (5.12). 
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The above observations can be summarized in the following statements [12, 33]. 
Assuming that A is asymptotically stable, and that Wọ = D+ D’ > Ois strictly positive 
definite, then system (5.1) is strictly passive if and only if the Hamiltonian matrix M 
in (5.16) has no purely imaginary eigenvalues. In presence of purely imaginary eigen- 
values, the system is passive only if the associated Jordan blocks have even size, in 
which case it can be shown that the corresponding eigenvalue trajectory A;(jw) does 
not change sign at wo. A qualitative illustration of the above statements is provided by 
Figure 5.2. For a more complete treatment, which is outside the scope of this introduc- 
tory chapter, see [1, 54]. We remark that the condition Wọ > 0 is equivalent to requiring 
that the system is asymptotically passive, so that the transfer function is nonnegative 
Hermitian for w — oo. 


R{ i} 


Figure 5.2: Illustration of the relationship between Hamiltonian eigenvalues p, (left) and eigen- 
values A;(jw) of Y(Jw) (right). In the left panel, purely imaginary Hamiltonian eigenvalues are de- 
noted with circles (number of circles denote multiplicity) to distinguish them from other eigenvalues 
(squares); only eigenvalues with nonnegative imaginary part are shown. In the right panel, the non- 
passive frequency bands Q, = (Ww 2,wW3) and Q, = (w4, Ws) are highlighted with a thick line, with 
corresponding local minima A,, andA,,. 


5.3 Checking passivity 


There are two main approaches for checking whether a given state-space model (5.1) 
is passive. These methods exploit the Positive Real Lemma (5.10) and the properties of 
the Hamiltonian matrix (5.16), respectively. 


5.3.1 Checking passivity via linear matrix inequalities 


The Positive Real Lemma discussed in Section 5.2.1.1 states that system (5.1) is passive 
if and only if (5.10) holds. Note that this condition embeds as a corollary also a stability 
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check, since restricting (5.10) to its upper-left block, which corresponds to setting u = 
0, i. e., considering the zero-input response, results in the popular Lyapunov condition 
for (simple) stability, here restricted to the LTI case, which reads 


aP=P’>o0: A'’P+PA<0. (5.18) 


Both (5.18) and (5.10) are recognized as LMI feasibility (convex) problems. As such, 
they can be readily solved by specialized LMI convex optimization software, such as 
SeDuMi [77] and Yalmip [55]. For a survey of tools see [49], part 6. If the system is 
passive, such tools will return a Lyapunov matrix P for which the above conditions 
are satisfied. Conversely, they will return a certificate of non-feasibility, thus proving 
existence of passivity violations. 

The main advantage of the LMI-based passivity check is simplicity: one does not 
have to write any particular code, since most LMI solvers have simple-to-use inter- 
faces. This advantage is counterbalanced by two important drawbacks. The first dis- 
advantage is computational complexity. The PRL in (5.10) requires proving nonnega- 
tivity of aM + N matrix, with P being unknown. The number of decision variables is 
N(N +1)/2, i.e., the number of elements of P. A direct implementation thus requires a 
computational cost that scales as O(N°), although advanced solvers exist that can re- 
duce this cost to O(N“) [82]. Exploitation of sparsity, structure and symmetries can be 
used to reduce this cost even further in many practical cases (for an example see [21]). 

The second main disadvantage of LMI-based passivity checks is in the binary na- 
ture of their output (passive/non-passive). If the system is not passive, no additional 
information is available from the solver that can be exploited to fix the passivity viola- 
tion by a suitable perturbation process. Fortunately, this information is available from 
the Hamiltonian-based passivity check, discussed next. 


5.3.2 Checking passivity using Hamiltonian eigenvalues 


As discussed in Section 5.2.1.3, an asymptotically stable system (5.1) with D + D7 > 0 
is passive if the Hamiltonian matrix M defined in (5.16) does not have purely imag- 
inary eigenvalues with odd-sized Jordan blocks (in the vast majority of cases these 
eigenvalues, if any, are simple). This suggests a simple algorithm for checking passiv- 
ity, summarized as pseudocode in Algorithm 5.1. This scheme is formulated so that 
it provides as output some additional information, in particular the frequency bands 
where (5.12) is violated [33]. This information will prove very useful in Section 5.5 for 
removing such passivity violations via perturbation. 

As a first step, we form the Hamiltonian matrix and we compute its eigenvalues 
Hk € eig(M). If no such eigenvalues are purely imaginary, and if D + D7 > 0, then the 
model is concluded to be strictly passive and the algorithm stops. No eigenvalue tra- 
jectory will cross the imaginary axis, otherwise the corresponding intersection would 
be pinpointed by some imaginary Hamiltonian eigenvalue. 
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The more interesting case occurs in the presence of purely imaginary eigenvalues 
Hk = jW,. Let us extract the subset of these eigenvalues with nonnegative imaginary 
part (recall that, if jw, is an eigenvalue, also — jw, is an eigenvalue due to the Hamil- 
tonian structure of M) and sort them in ascending order, 


O = Wo < W4 < Wy < +++ < WK < Wg = +00 (5.19) 


where wg is added even if 0 is not an eigenvalue of M, and where we set wx, = +00. 
The frequencies w, induce a partition of the frequency axis into disjoint adjacent sub- 
bands Q% = (Wp, Wg41) for k = 0,..., K. From the above discussion, ¥(jw) is nonsingu- 
lar Vw € Qy, Yk. 

Each subband Q, is then flagged as passive or non-passive by assigning k to corre- 
sponding index sets K, and K,,,, respectively, depending on whether (5.12) is verified 
or not for w € Qx. This condition is very easy to check, due to the continuity of all 
eigenvalue trajectories A;(;w), which is a consequence of the assumed asymptotic sta- 
bility, so that both H(s) and ‘(s) are regular on the imaginary axis. It is thus sufficient 
to check whether 

V(jw;,) > 0, where ð = ee, (5.20) 
is the midpoint of band Q,. If (5.20) is verified, then the model is uniformly passive 
in Q; and k € Kp. Otherwise, k € K,, and the number of negative eigenvalues A;(jw) 
(which is constant in Q,), is determined based on their evaluation at the midpoint @,. 

As a final optional step, the subbands Q% with k € K,, can be subjected to a lo- 
cal (adaptive) sampling in order to find all local minima of the eigenvalue trajectories 
A;(jw). These minima, denoted with their frequency location as (w,,,,A;,,,), correspond 
to the worst-case local passivity violations. See Figure 5.2 for a graphical illustration. 
See also [20, 34]. 

The computational cost of the passivity check in Algorithm 5.1 is dominated by 
the Hamiltonian eigenvalue evaluation. A general-purpose eigenvalue solver scales 
as O(KN?) where x is a constant, since the size of the Hamiltonian matrix (5.16) is 2N. 
Specialized eigensolvers exist that reduce this cost by exploiting the particular matrix 
structure [7, 10]; see also [1, 9, 79], but still retaining the scaling O(xN’) albeit with 
a smaller constant x. If the transfer function H(s) of the model is symmetric (which 
is usually verified in electrical and electronic applications), additional computational 
savings can be achieved by defining equivalent and smaller-size eigenproblems, often 
referred to as half-size passivity tests; see [23, 36, 47, 48, 75]. 

When the number of states N is medium-large and the state-space realization is 
sparse (for instance with A diagonal or quasi-diagonal), then it is more convenient to 
use eigensolvers based on repeated shift-invert iterations; see e. g. [3, 41, 85]. It has 
been demonstrated that these methods are able to reduce the scaling law of purely 
imaginary eigenvalue determination to O(KN), although with a possibly large x. See 
also [8, 60, 85] for details on more general structured eigenproblems. 
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Algorithm 5.1: Hamiltonian-based passivity check. 


Require: real state-space matrices A, B, C, D 

Require: A asymptotically stable, D + D’ nonsingular 
1: form the Hamiltonian matrix M of (5.16) and compute its eigenvalues uy 
2: if no eigenvalue is purely imaginary and D + DT > 0 then 


3: system is strictly passive: exit 
4: end if 
5: extract all imaginary eigenvalues u% = jw, and sort them as in (5.19) 
6: set Ky = Knp = 0 
7: fork =0,...,K do 
8: form subband Q% = (wk x41) and its midpoint w, 
9: if ¥(j@,) > O then 
10: system is locally passive Vw € Q, add k to Kp 
11: else 
12: system is not passive in Q,, add k to Knp 
13: find all local minima (w Aw) of the eigenvalues of ‘P(jw) in Qg 
14: end if 
15: end for 


5.4 System perturbation 


Assuming that the system (5.1) is detected as non-passive from a passivity check, the 
main question arises whether we can enforce its passivity through a small perturba- 
tion of its coefficients. What is actually important is not the amount of coefficient per- 
turbation, but rather the perturbation in the model response, which should be kept 
under control in order to maintain model accuracy. Of course, this approach makes 
sense only if the passivity violations of the initial model are relatively small to enable 
correction via perturbation. This situation is in fact commonly encountered in appli- 
cations. Very large passivity violations in models that should represent dissipative sys- 
tems are a clear indication of poor model quality. Such models should be discarded 
and regenerated. 

Several perturbation approaches are possible for (5.1). In the following, we focus 
on one particular strategy, which amounts to perturbing only the state-output matrix 
as C = C + 8C while leaving the other state-space matrices unchanged. This strategy 
induces the following perturbation in the transfer function: 


H(s) = H(s) + 6H(s) with 6H(s) = 6C(sI — A) 'B. (5.21) 
The corresponding impulse response perturbation is thus 


h(t) = h(t) + 6h(t) with 6h(t) = Ce% But) (5.22) 
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where u is the Heaviside step function. This approach leaves the state matrix A un- 
changed, thus preserving the system poles. Allowing for poles perturbation induced 
by a modification of A would require in fact additional constraints for ensuring that 
stability is not compromised. Except for very few cases [22, 53], most existing passivity 
enforcement schemes do not modify matrix A in order to preserve the system poles. 
This is a common scenario in those applications where passivity enforcement is ap- 
plied as a post-processing of a model identified from measurements with Vector Fitting 
(see [5, Chapter 8]). If the system is asymptotically passive with D + D7 > 0, there is 
also no need to modify the direct coupling matrix D. Modification of the input-state 
map B can be considered as an alternative to (5.21). 

Based on (5.21), we need to determine a cost function that measures the perturba- 
tion amount in terms of the decision variables, i. e., the elements of 6C. A number of 
popular cost functions are reviewed below. 


5.4.1 Gramian-based cost functions 


A natural choice for measuring the system perturbation is the L, norm. We have 


+00 


£2 = \őhl, = | tr(6h(t)Sn(t)") dt = tr(8CG,6C") (5.23) 
(0) 


where tr is the trace of its matrix argument, and 


+00 +00 
1 : E 
G.= | e^ BB'e^"" dt = = | (wI — A) ‘BB’ (qwI - A)” dw (5.24) 
0 —0o 


is the Controllability Gramian of the system, which is easily found by solving the Lya- 
punov equation 


AG.+G-A’ = —BB’. (5.25) 


Although simple to use, the cost function (5.23) is seldom used in applications. 
This is due to the fact that most often a reduced order model is obtained from some 
approximation process that ensures accuracy only in a well-defined frequency band, 
which usually does not extend up to oo. Assuming that the model accuracy is of inter- 
est only for w € [0, Wmax], then it is unnecessary and even detrimental to include any 
contribution to the Gramian coming from frequencies |w| > Wmax. A simple approach 
to obtain a bandlimited norm is to limit the integration bounds in (5.24) to FW max. One 
loses the possibility to compute G, through (5.25), so that the corresponding bandlim- 
ited Gramian should be obtained by a direct numerical integration of (5.24) through 
some quadrature rule. 
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An alternative option is to introduce a nonnegative weighting function p(w) in 
the frequency-domain integral (5.24), which allows one to fine-tune the contributions 
to the Gramian coming from different frequencies. The latter strategy lends itself to 
a simple algebraic procedure for the weighted Gramian computation, in the case the 
weight is restricted to be in form of a state-space system applied to the input or to the 
output of our original transfer function H(s). Some details follow. 

Let us consider a weighting function in state-space form with transfer matrix 
T(s) = Cp(sI - Ay) Br + Dy of compatible size, which is applied to defining a weighted 
error function 


6H,(s) = 6H(s) Is). (5.26) 


Instead of (5.23), we measure system perturbation through the I-weighted norm de- 
fined as 


EF = OHIt = Heli, (5.27) 
It can be easily shown [99] that this norm can be computed as 
\|SH||;. = tr(SC Pp 6C”), (5.28) 


where Pr is the upper-left block of the solution of the following augmented Lyapunov 
equation: 


AP+PA!’ = —BB" (5.29) 


i- n pa Re Gar pe & a (5.30) 
O Ar Br PE Po 

We see that this characterization is fully compatible with the standard L, norm, as far 
as the standard Gramian G, is replaced by its weighted counterpart Pp. This formula- 
tion can be adapted to applications that require control over relative error, by choosing 
I(s) = H “1(s) or even elementwise relative error [42]. If we are interested in retaining 
accuracy only in some prescribed frequency band (Wins Wmax), then I'(s) can be de- 
fined as a band pass filter matched to this band. 

The two Gramian-based error characterizations (5.23) and (5.28) are further sim- 
plified as follows. Let us consider (5.23), and let us assume that the initial model is 
controllable, so that the controllability Gramian G, is full-rank and strictly positive 
definite.* Computing the Cholesky factorization G, = QTQ and inserting it into (5.23) 
leads to 


with 


EZ = tr(6C QTQ. 6C") = (227) = IEI = E13 (5.31) 


4 In case G, is singular, a preprocessing step based, e. g., on Balanced Truncation [5, Chapter 2] can 
be applied to remove any uncontrollable states. 
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where || - |p denotes the Frobenius norm of its matrix argument, E = ôC QT and č = 
vec(E) stacks the columns of £ into a single column vector.’ We see that the model per- 
turbation is now cast as the Euclidean norm of the (vectorized) decision variables č in 
anew coordinate system induced by the Cholesky factor of the Gramian. Minimization 
of (5.31) is thus trivial. 


5.4.2 Data-based cost functions 


A further alternative for defining a cost function that measures the model perturbation 
error is based on a purely discrete formulation. Let us suppose that the model (5.1) 
was obtained in first place through a data-driven MOR scheme, starting form a set 
of frequency-domain measurements of the underlying system response (wọ, H,) for 
£ = 1,...,L. A natural choice would be to minimize the error of the perturbed model 
with respect to these initial data [21, 43, 44, 46] 


E = Y bE with € = Hwe) + 6H (yw) - Hellz, (5.32) 


Mr: 


€=1 


where we used the Frobenius norm to define the local error £; for each frequency point 
(of course other norm choices are possible), and where p; is a weighting factor to be de- 
fined based on the desired approximation criteria. A straightforward derivation shows 
that, vectorizing the decision variables as 6c = vec(6C), we can write 


E? = [Kc - dll (5.33) 
with 
Kee (eK, ae pK; ), d = (pdt We pid; ): (5.34) 
where the various components are defined using the Kronecker product @ as 
K, = [(jwel -A)'B]' 9I, d; = ved Hw) - Hy). (5.35) 
A particular case of (5.32) is obtained by defining 
E? = ||6H(ywp)llz- (5.36) 


This choice corresponds to setting the “target” data samples as the responses of the 
initial model, so that H, = H(jw,) and consequently d; = 0. Correspondingly, (5.33) 
reduces to the simple quadratic form 


E? = ||K6c|l3. (5.37) 


5 We will denote the inverse operation € = mat(£), where the size of © is inferred from the context. 
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5.5 Passivity enforcement 


In Section 5.4, we showed how the perturbation of a model based on a modification 
of the state-output map can be algebraically characterized as a quadratic form of the 
decision variables, i. e., the elements of ôC, possibly cast in a different coordinate sys- 
tem. The resulting cost function provides an effective control over model perturbation 
if used within an optimization problem, combined with suitable constraints for pas- 
sivity enforcement. In this section, we discuss the three most prominent approaches 
for casting the passivity conditions introduced in Section 5.2.1 as constraints, giving 
rise to three classes of algorithms for passivity enforcement. An overview of alternative 
approaches and a more complete treatment is available in [39]. 


5.5.1 Passivity enforcement via LMI constraints 


Let us consider an initial non-passive system (5.1), for which the PRL condition (5.10) is 
not satisfied. We try to enforce this condition on a perturbed system, where the state- 
output matrix C is updated as 


C=C+6C=C+58Q,', (5.38) 


where we used the change of variables in (5.31) based on the Cholesky factor Q, of the 
controllability Gramian. Enforcing the PRL for the perturbed system while minimiz- 
ing the perturbation, based, e. g., on the cost function (5.31), amounts to solving the 
following constrained optimization problem 


min IEI s.t. P=P’>0 and F(P,8)<0 (5.39) 
where 
A'P+PA PB- C7 - Q81 
F(P, E) = ( T =--T A J (5.40) 
B'P-C-&Q, -D-D 


The cost function in (5.39) is a quadratic form in the decision variables, and both con- 
straints are of LMI type [21]. Problem (5.39) is known to be convex, therefore there 
is a theoretical guarantee that a unique optimal solution exists, which can be found 
in polynomial time. In fact, specialized solvers for this class of problems exist, see 
e. g. [55, 77], therefore we do not detail any particular algorithm any further (see also 
section 5.3.1). The reader is referred to standard textbooks on convex optimization for 
more details [14]. 

As already discussed in Section 5.3.1, the computational cost that is required to 
solve (5.39) scales quite badly with the number of decision variables, equivalently with 
the system size. The main motivation for this high computational requirements is the 
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presence of the Lyapunov matrix P in the set of decision variables, which is only in- 
strumental to the PRL formulation, but which is not really needed as a result of the 
optimization process. Therefore, we can think of eliminating P with a suitable pre- 
processing step, in order to obtain a smaller LMI problem that can be solved more 
efficiently. The so-called trace parameterization provides a solution to this problem; 
see [25, 21, 18] for details. We now seek alternatives that provide even better scalability. 
The reader is encouraged to also see [43]. 


5.5.2 Passivity enforcement via Hamiltonian perturbation 


Let us consider the Hamiltonian-based passivity constraints discussed in 
Section 5.2.1.3. Under the assumptions that A is asymptotically stable and system (5.1) 
is asymptotically passive with D + D? > O, then (strict) passivity holds if the Hamil- 
tonian matrix M in (5.16) has no purely imaginary eigenvalues. If this is not true, as 
Figure 5.2 shows, the Hermitian part of the frequency response has some negative 
eigenvalues in some frequency bands, and the system is not passive due to those 
localized violations. 

The main idea of passivity enforcement via Hamiltonian perturbation is to induce 
a spectral perturbation on the imaginary Hamiltonian eigenvalues, so that they are 
displaced in the correct direction as to eliminate the local passivity violations [33]. 
A graphical illustration of this strategy is provided in Figure 5.3, where we show that 
when two imaginary eigenvalues are displaced along the imaginary axis in a direction 
that points inward each passivity violation band, the extent of the violation is effec- 
tively reduced (top panels). If the perturbation amount is sufficiently large to inducea 
collision of the two imaginary eigenvalues (bottom panels), then a bifurcation occurs 
and the two eigenvalues move off the imaginary axis. The passivity violation is thus 
removed. 

The above spectral perturbation is an inverse problem, which requires a precise 
characterization of the relation between matrix element perturbations and the corre- 
sponding induced change in the eigenvalues that we need to displace. The algorithm 
that we describe below is based on a first-order approximation of this relation. 

Let us consider once again a non-passive system which is perturbed by changing 
the state-output matrix as C = C + 6C. A straightforward first-order approximation 
analysis leads to the following expression for the perturbed Hamiltonian matrix: 


_py-! 
BW, 6C o ) (5.41) 


A = 6 ith 6M = 
SEERA E (awe +ôcTwz!C  8C™Wo'BT 
where W, = D + D7. Let us now consider a generic eigenvalue u, of M with unit mul- 
tiplicity, and let us denote the corresponding right and left eigenvectors as v, and w;, 
normalized such that ||v;|| = ||w;|| = 1. We have the following first-order eigenvalue 
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Figure 5.3: Illustration of passivity enforcement via Hamiltonian eigenvalue perturbation. Top and 
bottom panels refer to two different scenarios that may occur. In the top left panel the purely imag- 
inary Hamiltonian eigenvalues are depicted with empty dots, and their perturbation direction and 
extent is represented with thick arrows. The corresponding eigenvalue trajectories A;(jw) before 
(solid lines) and after (dashed lines) perturbation are depicted in the top right panel. Bottom pan- 
els show that when two imaginary Hamiltonian eigenvalues collide (left panel), the corresponding 
intersections of the eigenvalue trajectories A;(jw) with the frequency axis are removed (right panel). 


perturbation result [86]: 
Îk = Uk + OU with OU = — H ` (5.42) 


We now particularize (5.42) to the case of a purely imaginary eigenvalue yy = jx. It 
is well known that, for such eigenvalues, the left and right eigenvectors are related by 
Wk = —Jv,, where J is defined in (5.17), so that we can write 


_ VET OM Vy 


i Ay i (5.43) 


OU 

Splitting now the right eigenvector as v? = (vi, Vj) according to the block structure 
of M, we see that the denominator of (5.43) is purely imaginary 

VAI Ve = ZIVA} (5.44) 


whereas the numerator is real-valued since JM is real and symmetric. A tedious but 
straightforward calculation leads to 


VET EM vy = 20 {vy 8 yg} êc (5.45) 
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where ôc = vec(6C) and the auxiliary vector yẹ is defined as 
Yk = Wo (Cva + B’ V9). (5.46) 

Using the above expressions, we can finally rewrite (5.43) as 
Riv, 8 yg} 6c = (wy - pIvo}, (5.47) 


where we used the fact that under the adopted first-order approximation also the per- 
turbed eigenvalue is purely imaginary ji, = j@,. Furthermore, applying the change of 
variables (5.31) to (5.47) gives 


me = Nk (5.48) 


where 
Z = R(Q; Vig) oyi} Nk= (Wy - OW) VAM a} (5.49) 


This expression is a linearized constraint that relates the amount of (imaginary) eigen- 
value perturbation to the corresponding perturbation on the decision variables č. 

When using (5.48) as a constraint to determine ¢, the desired location for @ needs 
to be provided as input. With reference to Figure 5.3, we see that the direction where 
w, should be perturbed is directly related to the slope Nix of the eigenvalue trajectory 
A;(jw) that vanishes at wy. A heuristic yet effective choice for @, is 


1 


@, = Wr + A(W,,,-W,) fordl, < 0, 
l k = Wk + A(Wky1 — Wk) k (5.50) 


Ox = Wk -Al(Wk-Wpk1) for Al, > 0, 


where the control parameter 0 < a < 1 determines the maximum extent of the pertur- 
bation amount relative to the size of the violation subband. Additional details on how 
to determine the slopes Nix as well as appropriate values of a can be found in [33]. 
Supposing now that multiple eigenvalues pw, = jw, fork = 1,...,K are to be per- 
turbed concurrently, we need to collect all independent constraints (5.48) so that they 
are enforced simultaneously. The resulting optimization problem to be solved reads 


min H s. t. ze =n, k=1,...,K (5.51) 


This is a simple linearly constrained minimum norm problem, whose optimal solu- 
tion is go, = Z tn, where * denotes the pseudoinverse [14], with Z and 7 collecting va 
and 7, as rows. Compared to the evaluation of the Hamiltonian eigenvalues required 
to set up the constraints (5.48), the solution of (5.51) has a negligible computational 
cost. 

Although the solution of (5.51) is straightforward, its passivity constraint is based 
on a linearization process and is therefore only accurate up to first order. Therefore, 
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the perturbation fraction a should be selected to be small enough for the first-order 
approximation to be accurate, and multiple iterations may be required to displace all 
imaginary eigenvalues. Figure 5.3 illustrates two scenarios that may typically occur 
during iterations, whereas Algorithm 5.2 provides the pseudocode of a possible imple- 
mentation. The computational cost of this implementation is dominated by the Hamil- 
tonian eigensolution; see the discussion in Section 5.3.2. We remark that, despite the 
fact the optimization problem (5.51) at each iteration has a closed-form solution, the 
overall iterative scheme in its basic formulation is not guaranteed to converge, since 
a local perturbation of few eigenvalues does not guarantee that new imaginary eigen- 
values will not occur at other locations. The approach that is presented in the next 
section provides a more robust scheme. 


Algorithm 5.2: Passivity enforcement via Hamiltonian perturbation. 


Require: real state-space matrices A, B, C, D 
Require: A asymptotically stable, D + D7 > 0 
Require: control parameter O < a < 1 and max iterations i,,,, 
1: run Alg. 5.1 to check passivity, store {w,} and non-passive bands Q% 
: compute Gramian G, or weighted Gramian Pp and its Cholesky factor Q, 
: set iteration count i = O 
: while (system not passive and i < imax) do 
ici+l 
compute right eigenvectors v, and form vectors zy in (5.49), for all k 
define ô, as in (5.50) and form nọ in (5.49) for all k 
solve optimization problem (5.51) for ë 
update state-output map C — C + =Q." where € = mat(é) 
10: run Alg. 5.1 to check passivity, store {w,} and non-passive bands Qx 
11: end while 


NOOO, salts Oa aed ND 


Before closing this section we remark that the above discussion was based on the as- 
sumption of simple Hamiltonian eigenvalues. A full characterization of the general 
case with arbitrary higher multiplicity requires knowledge of the complete structure of 
the possibly multiple Jordan blocks of the Hamiltonian matrix. This discussion is out- 
side the scope of this chapter, the reader is referred to [1, 61] for a complete treatment. 
We only remark that the presence of defective eigenspaces is structurally unstable to 
small perturbations, so that the defectivity usually disappears if a small perturbation 
is applied. 

Passivity enforcement via Hamiltonian perturbation was first introduced in [33], 
followed by various applications [17, 31, 71] and extensions to large-scale systems [41] 
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with possibly frequency weighted accuracy norms [42]. It is worth mentioning the 
straightforward extension [11, 57] to so-called negative imaginary systems.° 


5.5.3 Passivity enforcement via local perturbation 


Let us consider once again a system that is detected as non-passive from Algorithm 5.1. 
One of the results that this algorithm provides in addition to the index set k € K,,, that 
identifies the non-passive bands OQ, = (w;,W ;,) is a set of local minima (w Àw) 
of the eigenvalues of ‘¥(jw) in each of these subbands. Figure 5.4 depicts these local 
minima with filled dots. 


Figure 5.4: Illustration of passivity enforcement via local perturbation. Linearized constraints are 
used to perturb (thick arrows) the local minima Aik (filled dots) of the eigenvalue trajectories A;(jw) 
(solid lines) so that they become nonnegative. The resulting perturbed eigenvalue trajectories 
(dashed lines) are uniformly positive after few iterations. 


Assume now to perturb the system through the usual state-output matrix as C = C + 
ôC. This perturbation leads to an induced perturbation on the eigenvalue trajectories 
A(jw), represented in Figure 5.4 by solid lines. We seek a constraint that displaces the 
local minima to a new nonnegative value [72, 73]. Denoting with v,, the eigenvector of 
Y(jw,,) normalized as ||v;,|| = 1 corresponding to the eigenvalue A,,,, we can express 
the induced eigenvalue perturbation through the following first-order approximation, 
which results in an inequality constraint after imposing nonnegativity: 


Nyy = Ary + VEL EYW) Vin = 0. (5.52) 


6 A system with square, strictly proper and stable transfer matrix H(s) is negative imaginary if and 
only if sH(s) is Positive Real. 
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Figure 5.4 provides a graphical illustration of the perturbation, together with the ex- 
pected perturbed eigenvalue trajectories (dashed lines). Note that these trajectories 
remain continuous after perturbation, thanks to the assumed asymptotic stability of 
the model (no poles on the imaginary axis). 

The constraint (5.52) can now be readily expressed in terms of our decision vari- 
ables 6C, noting that 


6Y(yw) = 8C T(yw) + T” (yw) 6C* (5.53) 


where T(jw) = (jul - A) |B. Using the vectorized form 6c = vec(6C) together with the 
change of variable (5.31) leads to 


zh E > -Aw with zy = 2R{(Q T Uwy) Vin) ® Viy}- (5.54) 


As a result, we cast our minimum model perturbation subject to local passivity con- 
straints as 


min IEI% s.t. zi č 2 -Aw Yk € Knp W. (5.55) 


This problem is convex and is readily solved through off-the-shelf software. Based 
on the analysis in [43] the computational cost for solving (5.55) can be reduced to 
O(KNM?). 

As for the Hamiltonian perturbation passivity enforcement, the above local per- 
turbation is not guaranteed to achieve a passive model after the solution of (5.55). In 
fact 
- theinequality constraint in (5.54) is only first-order accurate and does not guaran- 

tee that the perturbed eigenvalue will be nonnegative after applying the computed 

model correction; 
— it is not guaranteed that a local perturbation of all local eigenvalue minima 

(ww Aw) will not induce new passivity violations at new locations, in terms 

of new negative eigenvalue minima. 


The first problem can be easily addressed by embedding (5.55) within an iterative 
scheme that, after solving (5.55), applies model correction and repeats the perturba- 
tion until all local eigenvalue minima are nonnegative. The second problem is also 
easily addressed by the so-called robust iterations, described next. 

Assume that after model perturbation a new local eigenvalue minimum A,,,,, < 0 
is detected at some frequency w,,.y where the model was locally passive before pertur- 
bation. If we could enforce the eigenvalues of Y (JW ew) to remain nonnegative through 
an additional constraint together with those in (5.55), then the new violation would not 
have arisen. This is exactly the main idea of robust iterations, where problem (5.55) is 
solved only as a preliminary step. All new violations are collected and nonnegativity 
constraints are formulated as in (5.54) at the corresponding frequencies and added to 
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the set of already available constraints. This prediction step is repeated until no new 
violations are introduced. Then iterations continue after the model is updated. 

The passivity enforcement scheme based on local perturbations (without robust 
iterations) is outlined as pseudocode in Algorithm 5.3. More details on the robust iter- 
ation scheme are available in [44, 45]. 


Algorithm 5.3: Passivity enforcement via local perturbations. 


Require: real state-space matrices A, B, C, D 
Require: A asymptotically stable, D + DT > 0 
Require: max iterations imax 
1: run Alg. 5.1 to check passivity, store local eigenvalue minima (w, Aw) 
: compute Gramian G, or weighted Gramian Py and its Cholesky factor Q, 
: set iteration counti = 0 
: while (system not passive and i < imax) do 
icitl 
compute eigenvectors v,,, and form vectors Zw in (5.54), for all k, v 
solve optimization problem (5.55) for ë 
update state-output map C — C + EQ" where E = mat(£) 
run Alg. 5.1 to check passivity, store local eigenvalue minima (wp Aw) 
: end while 


BO: OO SrA OL FS OO ON 


= 
2 


5.6 Extensions 


The various passivity check and enforcement algorithms discussed in previous sec- 
tions were restricted to the narrow class of regular state-space systems (5.1), with A 
asymptotically stable, with D + D7 > 0, and with a supply rate defined by (5.7). In this 
section, we release these assumptions by providing suitable generalizations. 


5.6.1 Releasing asymptotic passivity requirements 


When W, = D+ D’ is singular but positive semidefinite, then the system might still 
be passive (although not strictly passive), with at least one of the eigenvalues A,( jw) of 
W(jw) vanishing for w — oo. In this scenario, the passivity check based on the Hamil- 
tonian matrix M in (5.16) cannot be performed, since M is ill-defined and cannot be 
constructed. 

The Hamiltonian matrix can, however, be generalized [93] by avoiding the in- 
version of Wọ in (5.15). Retaining the vector v and adding it as an additional block- 
component to the eigenvector in (5.16) leads to the following generalized eigenvalue 
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problem: 
A (0) B r I 0 0 r 
o -A? -C || q]=so{0 I O]fq). (5.56) 
C B W v o o 0/ \v 
M 


The pencil (M,N) has at least one infinite eigenvalue due to the singularity of Wo. 
However, by the same argument used in Section 5.2.1.3, the finite purely imaginary 
eigenvalues yu, = jw, of the pencil still correspond to the frequencies wp where one 
eigenvalue of ¥ vanishes as A;(jw;,) = 0. Therefore, the passivity check detailed in 
Section 5.3.2 and Algorithm 5.1 can still be applied as far as the Hamiltonian matrix 
eigenvalue problem (5.16) is replaced by (5.56). Alternative approaches for handling 
this case, based on frequency transformations, can be found in [23, 72, 76]. 


5.6.2 Enforcing asymptotic passivity 


When Wọ = D+ D’ is not sign definite, with at least one negative eigenvalue, most 
of the foregoing results do not apply if not properly generalized. For instance, the 
PRL condition (5.10) cannot be satisfied, since the model is not passive at infinite fre- 
quency. Therefore, the proposed system perturbation (5.21) for passivity enforcement 
will not be effective since also matrix D should be modified. 

There are two main alternative approaches to recovering strict asymptotic passiv- 
ity and enable all passivity enforcement schemes discussed in Section 5.5. One ap- 
proach involves a preprocessing step that first modifies D so that its symmetric part is 
strictly positive definite. Let us compute the following eigendecomposition: 


D+D" 


ae VAV! (5.57) 


where A = diag(A,,...,Ap) collects the eigenvalues and V the corresponding eigenvec- 
tors. We can simply redefine the eigenvalues in (5.57) as A, = max(A,,€) where € > O is 
a prescribed positive minimum value assigned to the eigenvalues. The resulting model 


D-D' 


Hap(S) = C(sI -A'B +Ô, Ô= Vdiag(A,,...,Ap)V" + (5.58) 


is guaranteed to be asymptotically passive. This new model H,,,(s) may exhibit a large 
deviation with respect to the original model response H(s), since a constant term is 
added affecting the response at all frequencies. This accuracy loss can be partially 
compensated by a standard state-output matrix correction C = C + 6C, where 6C is 
determined through 


min |6C (jo - A)'B+D-D\? (5.59) 
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where the norm is defined, e. g., as a data-based cost function at discrete frequencies 
Wp, as in Section 5.4.2. 
A second (preferable) approach for handling models that are not asymptotically 
passive amounts to: 
1. Allowing for a perturbation of the direct coupling matrix D = D + ôD in addition 
to the usual state-output map. The model perturbation thus becomes 


-1 
(sI — A) p (5.60) 


6H(s) = 6C(sI - A)'B + 8D = (6C 6D) ( ; 


which is compatible with all previous derivations with obvious modifications. 
Note that, in this case, the Gramian-based cost functions become ill-defined since 
the L, norm of 5H(s) is not finite, and a data-based cost function over a limited 
bandwidth, such as (5.32), should be used during passivity enforcement. 

2. Including an explicit local passivity constraint at w = oo during passivity enforce- 
ment. This constraint is just a simple particular case of (5.52), where 6D + 6D? 
replaces Y (Jw). 


We leave details of the above generalization to the reader. 


5.6.3 Descriptor systems 


Many model order reduction methods lead to systems in descriptor form 


l Ex(t) = Ax(t) + Bu(t), (5.61) 


y(t) = Cx(t) + Du(t), 


with a possibly singular matrix E, and often with D = 0, instead of the regular state- 
space form (5.1). A fundamental requirement to avoid an ill-defined (non-solvable) 
model is that the pencil (A, E) is regular with |sE — A| + O for somes e€ C. In the 
following, we only discuss the case of impulse-free or equivalently index-one systems, 
for which the transfer function 


H(s) = C(sE-A) 1B + D (5.62) 


has a finite asymptotic value Hy = lim,_,,, H(s). Descriptor systems with higher index 
require a special treatment’ which is outside the scope of this chapter. See [58, 84, 91, 
96, 98] for details. 


7 Index-two systems can be passive with a positive real transfer funciton H(s) only when the leading 
asymptotic term H(s) ~ sL,, for s — oo is such that L,, = i > 0. Higher index systems are not 
passive and, in order to recover passivity, the high order impulsive part must be deflated. 
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For index-one descriptor systems the Hamiltonian-based passivity check is appli- 
cable with a minimal modification [91, 96, 98]. In fact, repeating the derivations of 
Sections 5.2.1.3 and 5.6.1 while using (5.61) as a starting point leads to the same gener- 
alized eigenvalue problem (5.56), but with M redefined as 


E 0 0 
N=|0 E? 0]. (5.63) 
0 0 o 


Special care should be taken in the (generalized) Hamiltonian eigenvalue computa- 
tion, for which structured eigensolvers should be preferred to general-purpose eigen- 
solvers; see e. g. the implicitly restarted Krylov method of [59]. 

Passivity enforcement of descriptor systems via Hamiltonian eigenvalue perturba- 
tion is discussed in [83, 84, 91, 96, 97, 98] and further generalized to para-Hermitian 
pencils in [16]. The Gramian-based cost function for minimizing model perturbation 
of Section 5.4.1 should also be properly generalized; see [78, 84] and [5, Chapter 2] for 
details. Finally, we refer the reader to [30] for an extension of the Positive Real Lemma 
to descriptor systems. 


5.6.4 Other supply rates 


All above derivations and algorithms assume that the supply rate s(u,y) through 
which power is delivered to the system from the environment is given by (5.7). How- 
ever, this is not the only possible choice in general application fields. We review below 
the notable cases of scattering representations and general quadratic supply rates, 
discussing the various modifications that are required to define, check, and enforce 
passivity. 


5.6.4.1 Scattering representations and bounded realness 


The scattering representation is the most appropriate description of models in several 
application fields, in particular high-frequency electronics and electromagnetics. This 
is due to a number of reasons, including regularity and boundedness of the transfer 
function, as well as the ability to measure it with high accuracy. In scattering repre- 
sentations, the inputs u and outputs y are related to the power flow that is incident 
and reflected by the structure. In particular, the supply rate is defined as 


s(uy) = uu - y?y = llul? - yl? (5.64) 


and is interpreted as the net power transferred to the system from the environment, 
with the term |jul? denoting the power flow incident into the system and |ly||” the corre- 
sponding power flow that is reflected or scattered back into the environment [2, 4, 90]. 
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The supply rate s(u, y) in (5.64) leads to a set of passivity® conditions that are listed 
below, and which are obtained by repeating the derivations of Section 5.2.1 while ap- 
plying the appropriate modifications. 

The KYP Lemma for scattering representation is known as Bounded Real Lemma 
(BRL) [2, 74] and states that a scattering state-space system (5.1) is passive if and only if 


A’P+PA+C'C PB+C'D 
5P=P'>0: <0. 5.65 
( B’P+D'C =—--(I- op) ee) 
This lemma can also be stated in the equivalent LMI form 
A'’P+PA PB C7 
3P =P" >0: B'P -I D’ |<0. (5.66) 


C D -I 


A scattering system is passive when its transfer function H(s) is Bounded Real 
(BR), i.e., the following three conditions hold [2, 81, 90]: 
1. H(s) must be regular in the open right half complex plane R {s} > 0; 
2. H(s*)=H*(s); 
3. W(s) = 1-H" (-s)H(s) = 0 for R{s} > O. 


These conditions should be compared to the PR conditions of Section 5.2.1.2, noting 
that the only difference between PR and BR is in the definition of the function ¥(s). 
Correspondingly, the frequency-domain inequality conditions for the passivity of a 
scattering system still require ¥(jw) > O for allw € R, and can be expressed as in (5.12). 
An equivalent statement is based on the singular values of the transfer function 


0;<1, Vo; <€o(H(jw)), VweR, (5.67) 


which implies in turn that passive scattering models must have a bounded and regular 
transfer function H(s) when restricted to the imaginary axis s = jw, further requiring 
that the state-space matrix A must be asymptotically stable. Another yet equivalent 
condition for passivity is expressed in terms of the superior of the largest singular 
value throughout the imaginary axis, leading to the well-known H,, norm condition 


Ilo... = Sup Omax(H(jw)) < 1. (5.68) 


The Hamiltonian matrix associated to a scattering state-space system (5.1) reads 


Tpy\-1 pr Tpy\-1 pT 
ma (A ye? D)'D'c -B(I - D'D) 1B | bees 


c'(-DD')'c -Al + C’ DUI - D'D)'BT 


8 We retain the general term passivity also in the scattering (and for general quadratic supply rates), 
as a standard denomination in circuit, electronic and electromagnetic applications, although in some 
scientific communities this term is dedicated to immittance representations, and the term dissipative 
is used in the more general setting. 
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The system is passive if M has no purely imaginary eigenvalues (strictly passive) or at 
most purely imaginary eigenvalues with even-sized Jordan blocks [12, 33]. The same 
considerations of Section 5.2.1.3 apply. When the model is not asymptotically passive 
for w — ov, then D has one unit singular value and the above Hamiltonian matrix 
becomes ill-defined. In this case, M generalizes to the pencil (M, M) where 


A (0) B (0) I 000 
o -A7 0o -C 0 1 00 
= : = 5.70 
mM o BE JI D A 0000 (un) 
C 0 D =I 000 0 


which replaces (5.56) for scattering representations [91, 94, 97, 98]. Finally, when the 
underlying system is in descriptor form (5.61), then we simply replace I with E in M, 
as in (5.63). 

With all above redefinitions of appropriate passivity conditions for scattering sys- 
tems, all passivity check and enforcement algorithms discussed in Section 5.3 and Sec- 
tion 5.5 apply with obvious modifications. 


5.6.4.2 General quadratic supply rates 


Immittance and scattering representations are just particular cases of the more gen- 
eral situation in which the supply rate is a quadratic function of input and output 
variables. Such case is compactly described by 


swn-() (F DC) om 


with Q = Q7 and R = R’, from which the immittance case (5.7) and the scattering 
case (5.64) are obtained by setting Q = R = 0, S = I (up to the irrelevant scaling factor 
1/2) and Q = I, R = —I, S = 0, respectively. 

The LMI condition (KYP lemma) that characterizes a passive (dissipative) state- 
space system (5.1) with supply rate (5.71) reads 


A'’P+PA-C'R PB -(SC)' - CRD 
pep sts Woe Pe ie PO TG) 
B’P-SC-D°RC -Q-SD-(SD)° -DRD 
and the corresponding Frequency-Domain Inequality reads 
W(yw) = Q + H"(qw)S" + SH(yw) +H" (yw)RH(yw) > 0, Yw eR. (5.73) 
Finally, the Hamiltonian matrix that generalizes (5.16) and (5.69) reads 
Ve ee BWZ -Bw'B" (5.74) 
~NeeFRO+Z'’ wg -A" ez? wots! 


where W = Q+SD+ (SD)! +D'RD and Z = (SC + D'RC). We leave all details to the 
reader, pointing to [74, 88, 89] for a complete theoretical discussion. With suitable 


166 —— S. Grivet-Talocia and L. M. Silveira 


modifications, all passivity verification and enforcement schemes of Section 5.3 and 
Section 5.5 are applicable to this general case as well. 


5.6.5 Enforcing stability 


All results and algorithms presented up to now are based on the fundamental assump- 
tion that the system at hand is asymptotically stable, with all eigenvalues of state ma- 
trix A, or pencil (A, E) in the descriptor case, having a strictly negative real part. Many 
MOR schemes are able to preserve stability if the original model is stable, for instance 
balanced truncation [5, Chapter 2] or Krylov subspace methods based on split congru- 
ence transformations such as PRIMA [6, Chapter 4]. Basic Arnoldi or Lanczos methods 
are instead not generally able to preserve stability in the reduced order model. Con- 
sidering data-driven methods, the Vector Fitting algorithm [5, Chapter 8] incorporates 
a pole-flipping strategy that guarantees stability, whereas basic Loewner interpola- 
tion/reduction schemes [5, Chapter 6] do not guarantee stability. Further, even if a 
stability-preserving MOR method is used, roundoff errors in computer implementa- 
tions may compromise the stability and may result in some eigenvalue with a positive 
real part. 

Stabilization of a given model or system is a standard problem in Control Theory, 
where many alternative approaches usually based on feedback are routinely applied. 
The Reader is referred to any textbook such as [99]. The requirements we have in MOR 
applications are stronger than simple stabilization, since the final model should be as 
close as possible to the initial (unstable) model according to a prescribed performance 
metric or norm. Therefore, the simplistic approach of separating the unstable modes 
through an eigenvalue or, better, Schur decomposition and simply discarding them is 
not appropriate. Optimal stabilizing approximations are in fact available through ro- 
bust and reliable algorithms. As an example, we refer the reader to [52], where some 
approaches for finding the closest stable system based on H and Ho norms are in- 
troduced; see also [32]. 


5.6.6 Parameterized systems 


Passivity verification and enforcement methods can be extended to parameterized sys- 
tems, whose response H(s, 9) depends both on frequency s and (multivariate) param- 
eters 9 € © c RÊ. In this framework, many different approaches and solutions have 
been proposed, depending on how parameters are embedded in the model and on 
how the model is constructed. A complete treatment would be outside the scope of this 
chapter, so that we discuss only a specific yet wide class of model parameterizations 


N(S,9) _ Eho Let Rne &e(9) Puls) 
DOS) YF ig Leas Me €e(9) Pnl5) 


H(s; 9) = (5.75) 
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where Ry» € RPM and Tne € Rare the model coefficients, ¢~,,(s), (9) are suitable 


basis functions representing the dependence of model numerator and denomina- 
tor on frequency and parameters, respectively, and £ is a scalar index spanning the 
parameter basis set through a suitable linear ordering. The parameterization (5.75) in- 
cludes as particular cases the multivariate barycentric form leading to the (parameter- 
ized) Loewner framework [50] (see [5, Chapter 6]) and the generalized Sanathanan-— 
Koerner form [38, 80], which extends to the multivariate setting the Vector Fitting 
scheme [5, Chapter 8]. In the latter case the frequency basis functions are o(s) = 1 
and ¢,(s) = (S - qn) for n > O, where q, are predefined stable “basis poles”, either 
real or in complex conjugate pairs, and the parameter-dependent basis functions č 
can be orthogonal or trigonometric polynomials, or any other choice that is appro- 
priate for the application at hand. A parameterized (descriptor) realization is easily 
obtained from (5.75) as (5.61), where 


€ é 
A= A(9) = È Apée(9), C= C(9) = Y CeO) (5.76) 
€=1 e=1 


and E, B, D are constant. 

One notable and simple approach to obtain a uniformly passive model, so that 
all passivity conditions discussed in Section 5.2.1 hold V9 € ©, is to suppress the 
denominator in (5.75) as D(s) = 1 and construct the parameterized system through 
interpolation of a set of non-parameterized models. This is achieved by choosing č 
as interpolating, e.g. Lagrange, basis functions. There exist passivity-preserving in- 
terpolation schemes that ensure that, if the individual models being interpolated 
are passive, then also the interpolated parameterized model is passive V9 € ©. 
See [26, 27, 28, 29, 70] and the references therein for details on various alternative 
approaches within this framework. 

A complementary approach is to consider the fully-parameterized model in 
form (5.75) and extend the passivity verification and enforcement methods of Sec- 
tion 5.3 and Section 5.5 to the multivariate case. The main difficulty that arises in this 
scenario is that the Hamiltonian matrix, which is the main tool providing localization 
of the passivity violations, becomes parameter-dependent due to (5.76). The conve- 
nience of the purely algebraic test based on its eigenvalues is partially lost, since the 
purely imaginary eigenvalues (if any) are parameter-dependent. A possible strategy 
for tracking these eigenvalues based on adaptive sampling in the parameter space is 
discussed in [95], where a first-order perturbation analysis on the full Hamiltonian 
eigenspectrum is used to determine the regions in the parameter space that need re- 
fined sampling, in order to track the boundaries between the regions defining passive 
and non-passive models. Figure 5.5 provides an illustration by depicting the results of 
this adaptive sampling process in a case with two parameters d = 2. If any passivity 
violation region is detected (the red dots in Figure 5.5, left panel), then the worst-case 
passivity violations are determined as in Algorithm 5.1 and a multivariate extension 
of Algorithm 5.3 is applied to eliminate them. All details are available in [37, 40, 95]. 
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Figure 5.5: Adaptive sampling in a two-dimensional parameter space, applied to a non-passive 
model (left panel) and to the corresponding passive model after enforcement (right). Each dot rep- 
resents a non-parameterized model instance obtained by evaluating the parameterized model (5.75) 
at the corresponding sampling point. Each dot is colored in green/red if the corresponding model 
instance is locally passive/non-passive, respectively, as resulting from the absence/presence of 
imaginary Hamiltonian eigenvalues. Iterative refinement leads to tracking the boundaries between 
passive/non-passive regions. Courtesy of A. Zanco, Politecnico di Torino. 


5.7 Examples 


5.7.1 A high-speed interconnect in a mobile device 


The passivity enforcement process is here applied to a model of a high-speed intercon- 
nect providing a data link in a smartphone. An initial characterization of the struc- 
ture was obtained through a full-wave numerical simulation of the time-harmonic 
Maxwell’s equations, which provided a set of frequency samples of the 4 x 4 (scatter- 
ing) transfer function S(jw) from 0 to 50 GHz. These samples were processed by Vector 
Fitting [5, Chapter 8], obtaining a rational approximation of the system responses. This 
rational approximation was then converted to a state-space realization as in [5, Chap- 
ter 8]. The accuracy of the rational approximation is excellent, as depicted in the two 
top panels of Figure 5.6. 

A Hamiltonian-based passivity check on this model reveals the presence of K = 10 
purely imaginary Hamiltonian eigenvalues py, = jw, (see Figure 5.7, left panel). Cor- 
respondingly, a sweep of the model singular values o;(H(jw)) (see the top panel of 
Figure 5.8) up to a maximum frequency slightly beyond wg reveals a few evident pas- 
sivity violations, corresponding to singular value trajectories exceeding the passivity 
threshold o = 1. The local singular value maxima are highlighted with red dots in 
Figure 5.8. 

Figure 5.8 depicts the typical situation that arises when fitting a rational model to 
response data over a finite frequency band: passivity violations usually occur at fre- 
quencies that fall outside the range where data samples are available. The fact that 
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Figure 5.6: Comparison between model and original data used for model extraction for the smart- 
phone interconnect. For illustration, only response Sj5(jw) of the 4 x 4 scattering matrix is reported. 
Top two panels refer to the initial non-passive model, whereas bottom two panels refer to the model 
after passivity enforcement. 
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Figure 5.7: Hamiltonian eigenvalues before (left) and after (right) passivity enforcement for the 
smartphone interconnect model (only eigenvalues with positive imaginary parts are shown). The 
purely imaginary eigenvalues are highlighted with a darker color in the left panel. 


such violations are not located within the modeling bandwidth may induce a false 
sense of confidence in the model user, who may argue that out-of-band passivity vi- 
olations are unimportant, since located at frequency ranges that are not of interest. 
In fact, a time-domain simulation of the model using a transient ODE solver is agnos- 
tic whether the passivity violation occurs within or off-band: during time-stepping, 
numerical approximation errors due to the adopted ODE solver will inevitably excite 
those frequencies where the model amplifies energy, leading to instability. This is ex- 
actly what happens in Figure 5.1, where the thin blue line demonstrates the instability 
induced by this initial non-passive model. In this simulation scenario, the model was 
interconnected to a set of other linear (passive) circuits, and it was indeed possible to 
determine exactly the two poles p = 2(a + Jf) that are responsible for this instabil- 
ity, obtaining a = +1.13 x 10° Hz and £ = 5.32 x 10° Hz. The real part is positive, and 
the imaginary part nearly matches the frequency of the singular value peak; see Fig- 
ure 5.8. This is exactly the frequency where the model injects energy into the system. 
See [35] for additional details on destabilization of non-passive models. 

Enforcing model passivity removes the instability, as we already know from Fig- 
ure 5.1. Application of Algorithm 5.3 leads to a passive model in 5 iterations, docu- 
mented by the singular value plots in the various panels of Figure 5.8. The final pas- 
sive model has no purely imaginary Hamiltonian eigenvalues, as evident from the right 
panel of Figure 5.7, and its responses still match very accurately the original data sam- 
ples, as depicted in the bottom panel of Figure 5.6. 
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Figure 5.8: Evolution of the singular values (blue solid lines) of the smartphone interconnect during 
passivity enforcement iterations through Algorithm 5.3. Passive and non-passive frequency bands 
are highlighted with green and red color, respectively. Local maxima of all singular value trajecto- 
ries in each non-passive frequency band, which are used to set up local passivity constraints, are 
highlighted with red dots. 
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5.7.2 An interconnect link on a high-performance PCB 


We consider here the coupled interconnect link on a high-performance Printed Circuit 
Board (PCB), already discussed in [5, Chapter 8], where an accurate model was ex- 
tracted using the Vector Fitting algorithm from scattering measurements performed on 
the real hardware. As depicted in [5, Chapter 8], Figures 6 and 7, the model responses 
of this initial model are visually undistinguishable from the measured samples, with 
a model-data error of 1.34 - 107° (worst-case RMS error among all responses). 

A passivity check performed on this initial model reveals some small passivity vi- 
olations at low frequencies. This is actually expected, since the system is almost loss- 
less at low frequency, and passivity violations induced by the rational approximation 
process of VF are therefore more likely than at high frequency, where energy dissi- 
pation is more pronounced. The passivity violations are detected by the presence of 
eight pairs of purely imaginary Hamiltonian eigenvalues, depicted in Figure 5.9, pan- 
els (a), (c) and (d). The corresponding frequencies denote crossings of the singular 
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o 5 7"-0.05 0 
a (GHz) a(GHz) x10 a (GHz) 
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Figure 5.9: Hamiltonian eigenvalues a + jf = /2n of the PCB interconnect model. Panels (a), (c), (d): 
original model after Vector Fitting; panels (b), (e), (f): model after passivity enforcement. Top panels 
(a), (b) depict the full Hamiltonian eigenspectrum. Bottom panels (c), (d) and (e), (f) are enlarged 
views of the top panels (a) and (b), respectively, at different magnification levels. Panels (e) and (f) 
show that all purely imaginary Hamiltonian eigenvalues clustered at low frequencies, depicted in 
panels (c) and (d), are effectively removed by passivity enforcement. 
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value trajectories 0;(jw) of the threshold ø = 1, as depicted in Figure 5.10, top panel. 
After few iterations of Algorithm 5.3 all these passivity violations are removed. As the 
bottom panel of Figure 5.10 shows, all singular value trajectories of the passive model 
are uniformly bounded by one. This is further confirmed by panels (e) and (f) of Fig- 
ure 5.9, which show that all purely imaginary eigenvalues of the initial model are now 
displaced from the imaginary axis. 


Original (non-passive) model 


Singular values 


10° 105 10% 10° 107 107 10° 10! 
Frequency (GHz) 
Passive model 


0.5 


Singular values 


0.2 
10° 10° 104 10° 10° 107 10° 10! 
Frequency (GHz) 

Figure 5.10: Top panel: singular value trajectories (blue lines) of the initial (non-passive) PCB in- 
terconnect model, revealing low-frequency passivity violations exceeding the passivity threshold 
o = 1 (red line). Black dots correspond to the frequencies of the purely imaginary Hamiltonian eigen- 
values; see Figure 5.9, panels (c) and (d). Bottom panel: singular value trajectories after passivity 
enforcement, which are uniformly below the passivity threshold. 


The passivity enforcement process did not spoil model accuracy. Figure 5.11 compares 
the scattering responses (1, 2) and (1, 3) of the passive model to the raw measured sam- 
ples from which the initial model was derived (the same responses already depicted 
in [5, Chapter 8], Figures 6 and 7). Also for the passive model the responses closely 
match the measurements, with a worst-case RMS model-data error of 1.38 - 10°. 


5.8 Conclusions 


The goal of this chapter was to survey the most widely used techniques for enforc- 
ing passivity of reduced order models. To motivate the ensuing description, a simple 
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Figure 5.11: Comparison between passive model responses and measured data used for model iden- 
tification for the high-speed PCB interconnect of Section 5.7.2. 
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example was shown that illustrates in striking fashion the need for ensuring passiv- 
ity in models. The focus of the chapter was on Linear Time-Invariant (LTI) systems in 
state-space form, although the techniques reviewed are applicable in other represen- 
tations with appropriate modifications. Conditions for testing the passivity of a given 
LTI model as well as approaches for perturbing non-passive systems in order to enforce 
passivity were reviewed and examples were shown to demonstrate the application of 
such techniques to realistic cases. 
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6 The Loewner framework for system 
identification and reduction 


Abstract: One of the main approaches to model reduction of both linear and nonlinear 
dynamical systems is by means of interpolation. This approach seeks reduced models 
whose transfer function matches that of the original system at selected interpolation 
points. Data-driven methods constitute an important special case. We start with an 
account of the Loewner framework in the linear case [52]. It constructs models from 
given data in a straightforward manner. An important attribute is that it provides a 
trade-off between accuracy of fit and complexity of the model. We compare this ap- 
proach with other approximation methods and apply it to different test-cases. One 
of the case studies to which we apply the aforementioned methods is defined by the 
inverse of the Bessel function. We then turn our attention to the approximation of an 
Euler—Bernoulli beam model with Rayleigh damping. Further case studies include the 
approximation of two real valued functions with specific difficulties (discontinuity, 
sharp peaks). One computational tool is the SVD; its complexity is cubic in the num- 
ber of data points. For large data sets the CUR factorization is a viable alternative. Note 
that its complexity is cubic as well but with respect to the dimension of the reduced or- 
der model (ROM). Another option is to use stochastic procedures such as randomized 
singular value decomposition (r-SVD) [41]. 


Keywords: Loewner framework, rational interpolation, model order reduction, data- 
driven, system identification, infinite dimensional systems 
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6.1 Introduction 


A challenging problem that computational linear algebra deals with is that of big data 
modeling. The problem consists mainly in constructing reduced complexity systems 
from input/output data. This contribution focuses on reduction via interpolation. The 
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Loewner framework is a data-driven approach which can construct low order models 
from measurements. It can be applied to both frequency and time-domain data [56]. 
Here we will concentrate on frequency domain data. The Loewner framework will be 
implemented using (a) the SVD (singular value decomposition), (b) the CUR factoriza- 
tion, (c) randomized SVD (1-SVD). Its performance will be compared with that of the 
recently developed AAA algorithm see [53], the Vector Fitting approach [21, 40] and 
the IRKA algorithm [13]. 

The paper is composed of three sections. The first one covers the fundamen- 
tals of the Loewner framework starting from left and right interpolatory reduction. It 
concludes (a) by describing an interpolation property satisfied by reduced systems 
and (b) by making the procedure of obtaining real reduced models (despite com- 
plex interpolation points and values) explicit. Next the description of two algorithms 
namely, Loewner-SISO and Loewner-MIMO, is given. Finally two simple examples are 
presented and the role of generalized inverses outlined. 

The second chapter describes methods for implementing the Loewner reduction, 
namely the SVD, the CUR factorization and the role of splitting the interpolation point 
in left and right sets. The third chapter illustrates the main features of the Loewner 
approach by means of seven case studies, namely, (a) the CD player, (b) an oscillating 
function, (c) the inverse of a Bessel function, (d) an Euler-Bernoulli beam, (e) a heat 
equation, (f) a function with two sharp peaks, and (g) the sign function. An epilogue 
and references conclude the presentation. 


6.2 The Loewner framework and moment matching 


The Loewner framework has attracted increased attention of researchers from vari- 
ous fields of applied mathematics and control engineering in the last 13 years. Con- 
sequently, a fair amount of contributions that are now available, deal with various 
aspects on further extending the framework and with its application to different test- 
cases. Below we provide an account of some of the work related to or inspired by the 
“Loewner framework” (see Table 6.1). 

Consider linear, time-invariant systems with m inputs, n internal variables (states 
if E is non-singular) and p outputs: 


(6.1) 


Ex(t) = Ax(t) + Bu(t), y(t) = Cx(t), where 
E,AcR™, BeR™", CeR™. 


We will denote this realization of the system by means of the quadruple = = (C, E, A,B). 
The associated transfer function is 


H(s) = C®(s)B where @s) = (sE - A)! e C™". (6.2) 
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Table 6.1: A collection of contributions related to the Loewner Framework. 


Original paper [52] & tutorial paper [6] Chapters 4 and 7 in the book [9] 

Extension to Application to 

parametrized linear systems [4, 42] modeling multi-port linear systems [48] 
bilinear systems [5, 45, 46] preserving the stability of the ROM [30] 
quadratic systems [29, 36] the Burgers equations [8] 

quadratic-bilinear systems [32] the Oseen equations [10] 

linear switched systems [34] preserving the structure of DAE systems [37] 
polynomial systems [11, 16] systems with delay [35, 59] 

modeling from noisy data [20, 50] approximating functions [31, 33, 43, 44] 
modeling from time-domain data [56] singular/rectangular systems [3] 


genes oscillations [7] and biological rhythms [68] 


Perspective based on duality and application Interpretation based on interconnection and 
to bilinear differential [57, 58] application to LTV systems [60, 61] 


A common way to reduce the complexity of a system is by means of Petrov-Galerkin 
projections. Such projections are defined by means of two matrices V, W € RK ,k<n, 
satisfying the condition that W’V € R*** is invertible. 


Definition 6.1. Consider vw; € R",i = 1,...,k, and let V = [vj,...,v;], W = 
[w,,...,W;] € R™*. The map defined by Il, = V(V’V)'V’, is an orthogonal pro- 
jection onto the span of the columns of V. If W’ V is non-singular, II, = viw'v) “wT, 
is an oblique projector onto the span of the columns of V, along the columns of W. 
II, and II, are usually referred to in the model reduction literature as Galerkin and 
Petrov-Galerkin projectors, respectively. 


Reducing the system È = (C, E, A, B) defined above, by means of a Petrov—Galerkin 
projection, we obtain the reduced system È = (C, Ê, A, Ê) with the reduced order matri- 
ces given by 


C=cvce R™*, Ê=WEV, A=-WaverR™, B=W'BeR™". (6.3) 


There are many ways of choosing Petrov—Galerkin projectors in order to achieve 
various goals. Here we will restrict our attention to interpolatory projections. Such pro- 
jectors yield reduced models which match moments of the original system. These mo- 
ments are values of transfer functions at selected frequencies, referred to as interpo- 
lation points. 


Remark 6.1. The D-term. In the system representations to follow no explicit D terms 
will be considered. The reason is that such terms can be incorporated in the remaining 


1 The notation (7 indicates transposition of (-), while the notation (-)* indicates transposition of (-) 
followed by complex conjugation. 
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matrices of the realization, thus yielding what is known as a descriptor representation. 
Consider a rank-revealing factorization 


D=D,D, where D; € R??, D, eR”, 


and p = rank D. It readily follows that 


[hab ALS yb RJ ete mn 


is a descriptor realization of the same system with no explicit D-term. The reason for 
not considering explicit D-terms, comes from the fact that the Loewner framework 
yields descriptor realizations with the D-term incorporated in the rest of the realiza- 
tion. 


6.2.1 Moments of a system 


Given a matrix-valued function of time h : R > R?*”, its kth moment is 


Nk = | noat, k=0,1,2,.... 
0 


If this function has a Laplace transform defined by H(s) = L(h)(s) = ie h(t)e “t dt, 
the kth moment of h is, up to a sign, the kth derivative of H evaluated at s = 0: 
k dk pxm 
ng = (-1)" —H(s)|,-9 € R° , =k = 0,1,2,.... 
dsk 


In the sequel, we will also make use of a generalized notion of moments, namely the 
moments of h around the (arbitrary) point Sọ € C: 


ni(So) = | the dt. 
(0) 


These generalized moments turn out to be (up to a sign) the derivatives of H(s) evalu- 
ated at s = So: 
k dk pxm 
Nk(So) = (CD —{H(S)ls-5, €R =k =0,1,2,.... 
ds 


In this context, assuming for simplicity that E = I, the moments of h at sọ € Care 
= —(k+1) 2 
Ny (So) = —kC(Sol - A) B, k=0,1,2,..., 


provided that Sọ is not an eigenvalue of A. 
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Notice that the moments determine the coefficients of the Laurent series expan- 
sion of the transfer function H(s) in the neighborhood of s; € C; in particular 


(s- a 


(s -50 A 


H(s) = H(so) + H” (so) +H (so) 


(s tl 


(s- a 


=No(So) + M1(So) -+ Ng(So) 


Approximation by moment matching 
Given È = (C, E, A,B), consider the expansion of the transfer function around s;, i = 
1,...,r, as above. Approximation by moment matching consists in finding 


Ê = (Ĉ, Ê, Â, Ê), BAcR™, Ber, Cer (6.4) 


such that the expansion of the transfer function 


+ (Si) + 73(S;) +o, 


As) = fs) + hts) EZE + ipts ESE + ats) EE 


for appropriate s; € C, and £;, r € N, satisfies 
nj(si) = 1j(S;), j=1,2,...,@andi=1,...,r 


This problem is also known as rational interpolation. 


6.2.2 Rational interpolation by Petrov—Galerkin projection 


Rational interpolation by projection was originally proposed in the work of Skelton 
and co-workers; see [65, 66, 67]. Contributions were also made by Grimme, Gallivan 
and van Dooren [23, 24, 38]. 

Suppose that we are given a system È = (C, E, A, B), assumed SISO (single-input 
single-output, i. e., m = p = 1) for simplicity. We wish to find lower dimensional mod- 
els Ê = (Ĉ, Ê, Â, Ê), defined as in (6.3), k < n, such that Ê approximates the original 
system in an appropriate way. 

Consider k distinct points s; € C. Define V as a generalized controllability matrix: 


V = [(s;E- A) 'B,...,(s,E- A) B] € c™*, (6.5) 


and let W* be any left inverse of V. Then we have the following. 


Proposition 6.1. Ê interpolates the transfer function of £ at the points s;, that is, 


H(s;) = C(s,E - A)'B = C(sjE - A)'B = Ais), j=1...,k. 
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Proof. Denoting bye; = [0---0 _1_ 0--- 0] T the jth unit vector, we obtain the string 


wi 


j 
of equalities below which lead to the desired result: 


C(s;E - Â)'Ê = CV(sjW*EV - W*AV) W"B 
= CV(W* (sE - A)V) WB 
= CV([+ -= W*B*---*]) W*B 
= [C(s,E- A) 'B,...,C(s,E-A) ‘Ble; 
= C(s,E - A) 'B. 


Next, we are concerned with matching the value of the transfer function at a given 
point Sọ € C, together with k — 1 derivatives. We define 


V = [(soE- A) 'B, (SgE- A) °B,..., (sgE- A) “B] € c™*, (6.6) 


together with any left inverse W. The following holds. 


Proposition 6.2. Ê interpolates the transfer function of = at So, together with k—1 deriva- 
tives at the same point: 


(-17 


j! 


d a: sa Hoc. Bet ayn = j 
gg Ols = C(S0E - A) DB = E(sok - A) IÈ = SP gg EO 


forj =0,1,...,k-1. 


Proof. Let V be as defined in (6.6), and W be such that w'v = I,. It readily follows 
that the £th power of the projected matrix sọÊ — A satisfies 


(sÊ - A)’ = E eee x W*B Koes *]. 
£ 


-1 ke 
Consequently [W!(s E - A)V] “WTB = e,, which finally implies 


C(soE - Â)“ Ê = CV[W!(soE - A)V] “WTB = CVe, = C(soE - A) “B, 


for £ =1,2,...,k. 


Since any V that spans the same column space as V achieves the same objective, 
projectors composed of combinations of the cases (6.5) and (6.6) achieve matching of 
an appropriate number of moments. To formalize this we will make use of the follow- 
ing matrices: 


R,(E,A,B;0) = [(0E-A)'B (cE-A)°B --- (oE-A)*B]. 
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Corollary 6.1. 

(a) If V as defined above is replaced by V = RV,R «€ Re, detR + 0, and W by 
W = RTW, the same matching results hold true. 

(b) Let V be such that 


span col V = span col [Rm (E, A, B;01) -+ Rm, Œ, A, B; 0¢)], 


and W be any left inverse of V. The reduced system matches m; moments at 0; € C, 
b= E A 


6.2.3 Two-sided projections 


The above results can be strengthened if the row span of the matrix W! is chosen 
to match the row span of a generalized observability matrix. In such a case twice as 
many moments can be matched with a reduced system of the same dimension. Here, in 
addition to points s,,...,s,, and the associated (6.5), we are given k additional distinct 
points s;,1,...,So,- These points are used to define a generalized observability matrix: 


W = [(Su ET- AT) CT --- (sE! -AT CT] <c™ (6.7) 


Proposition 6.3. Assuming that WTV has full rank, the projected system Ē, interpolates 
the transfer function of È at the points s;,i =1,...,2k. 


Proof. The string of equalities that follows proves the desired result: 
Ĉ(siÊ - A) 'B = CV(s;W'EV - WAV) W'B 
= CV(W"(s,E- A)V) WTB 
= CV(W'|---B---]) WTB 
= CVe; = C(sE - A)B, fori=1,...,k, 


-1 


=CV c |v | wB 


=e W’B = C(sE-A)'B, fori=k+1,...,2k. 


The projectors (see [62]) discussed in the previous section satisfy the Sylvester 
equations as shown next. 


Proposition 6.4. With A = diag[A;,...,A,], M = diag[m, ..., yql, where A; and pj are 
mutually distinct, R = [1 --- 1] € R*, and L = [1 --- 1]? € RY, the matrices V and W 
satisfy the Sylvester equations: 


EVA-AV=BR and MWE- W'A =LC. (6.8) 
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6.2.4 Interpolatory model reduction for MIMO systems 


In the general case of MIMO (multi-input multi-output) systems, the moments are pxm 
matrices. So, in the case of rational matrix interpolation the most appropriate way to 
proceed is to interpolate along certain directions. This leads to the so-called tangential 
interpolation problem (see e. g. [6, 21, 25]). 

More precisely, we are given a set of input/output response measurements spec- 
ified by left driving frequencies DARN c C, using left input or tangential directions: 
CAH c C, producing left responses: vèl c C™, and right driving frequencies: 
{Aye c C, using right input or tangential directions: ryt, c C™, producing right re- 
sponses: {wk We are thus given the left data: (y;; ev ),j =1,...,q, and the right 
data: (À; r; W;), i = 1,...,k. The problem is to find a rational p x m matrix H(s), such 
that 


HAr =w, i=1...k, GHU) =v, j=14...4. (6.9) 
The left data is rearranged compactly as 
pı e vi 
M= e ec. L=| : pects Wey : |ec?™, (6.10) 
Hg es va 


while the right data is rearranged as 


A, 
R= me rg] e C™K, 
A= a eo, = is : : (6.11) 
W=[w, w, = w,)eC™, 


Interpolation points and tangential directions are determined by the problem or are 
selected to realize given model reduction goals. For SISO systems, i. e., m = p = 1, left 
and right directions can be taken equal to one (£; =1,r; = 1) and hence the conditions 
above become 


(6.12) 


H(u;) H(y;) > Hy) vp j To 
H(;) = HA) > ÂA) =w; i=1,...,k. 


6.2.5 The Loewner framework 


Given a row array of complex numbers (Hj, Vj), j=1,...,q,anda column array, (A;, W;), 
i=1,...,k, (with A; and the Hj mutually distinct) the associated Loewner matrix is 


V-W: Vi- Wk 
pa- My -Ax 

L= en He ; e CPK, 
V,-W; Vq-Wk 
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Definition 6.2. If g is rational, i.e., g(s) = RO, for appropriate polynomials p, q, the 


McMillan degree or the complexity of g is deg g = max{deg(p), deg(q)}. 


Now, if w; = g(A;), and Vj = gu), are samples of a rational function g, the main 
property of Loewner matrices asserts the following. 


Theorem 6.1 ([52]). Let L be as above. If k,q = degg, then rankL = degg. In other 
words the rank of L encodes the complexity of the underlying rational function g. Fur- 
thermore, the same result holds for matrix-valued functions g. 


6.2.5.1 The Loewner pencil and interpolatory projectors 


In the sequel we denote the tangential versions of (6.5) and (6.7) by R, O, respectively. 
For arbitrary k and q, these are defined as 


R = [(A,E- A) 'Br,,...,(A,E- A) Bry] € C™*, (6.13) 
OF =[(Q4E7-AT)'cTe, --- (ugBT- AT) 'cTe,] ec™, (6.14) 


It readily follows that the reduced quantities E and A form a Loewner pencil: 


Vin Ww, Ve W 
MA, My -Ay 
Ê = -OER = - i : = -L € ch, (6.15) 
vt- Wi vate We 
Hg-A, Hq-Ax 
mv ew OE Wik 
HA HiAk 
Â = -OAR = - ; ee = -L € CPK, (6.16) 
Hq Vg Y1 -27 Wii R HaVghi—€4 WiAk 
Hg-Ay Hg-Ax 
T 
vi 
B=0B=| : |=vec”", €=CR=[w, =- wp ]=wee™. (6.17) 
T 
Va 


The resulting quadruple (W, L, L,, V) is called the Loewner quadruple. 


Lemma 6.1. Upon multiplication of the first equation in (6.8) with © on the left and the 
second by R on the right we obtain 


L,-LA=VR and L,-ML=LW. (6.18) 


By adding/subtracting appropriate multiples of these expressions it follows that the 
Loewner quadruple satisfies the Sylvester equations 


ML-LA=VR-LW and MIL, -1,A=MVR-LWA. (6.19) 
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Theorem 6.2. Assume that the pencil (IL,, L) is regular.” Then H(s) = W(L, - sL) Y, 
satisfies the tangential interpolation condition (6.9). 


Proof. Multiplying the first Sylvester equation by s and subtracting it from equation 
the second one, we get 


M(L, - SL) - (L; - sL)A = (M - sI)VR - LW(A - sI). 


Multiplying this equation by e; on the right and setting s = A;, we obtain 
(M - A,D(L, - AjL)e; = (M - AI) Vr; > (L; — A;L)e; = Vr; 
> We; = W(L, - AjL) Yr]. 
Thus w; = H(A,)r;. Next, we multiply the above equation by ej on the left and set 
s= Hj: 
ej (Ls — HL)(A - 4I) = ef LW(A - I) = ef (Ls - pL) = €W 
> e V = W(L, - pL) 'V. 


T of 
Thus v = £; H(y;). 


Remark 6.2 (Parametrization of all interpolants of complexity equal to the size of L). 
With K € C?*”, the Sylvester equations can be rewritten as 


ML - LA=(V-LK)R-L(W-KR) and 
M(L, + LKR) - (L; + LKR)A = M(V - LK)R — L(W — KR)A. 


These equations imply that (W, L, L,, V) is an interpolant for all K € CP", where 
L, = L, + LKR, V = V- LK and W = W - KR. 
6.2.5.2 Construction of interpolants 
If the pencil (L,, L) is regular, then E = -L,A = -L,,B = V,C = W, is a minimal 
interpolant of the data, i. e., H(s) = W(L, - sL)‘V, interpolates the data. Otherwise, 
as shown in [52], problem (6.9) has a solution provided that 
rank[sL - L,] = rank[L, L,] = rank | 3 | =T 
for all s € {Aj} U {u;}. Consider then the short SVDs: 
CW L {7 * 
[L, IL,] = Y}, X*, | | = YŁ,X*, 
L, 


whete>,, £, eR™, VYec? X e CEY, Yecr xec, 


2 The pencil (IL, L) is called regular if there is at least one value of A € C such that det(IL, — AL) + 0. 
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Remark 6.3. r can be chosen as the numerical rank (as opposed to the exact rank) of 
the Loewner pencil. For issues related to the rank, we refer the reader to [2], page 50, 
for details. 


Theorem 6.3. The quadruple (E, A, B, C) of sizer x r,r xr, r xm, p xr, given by 
E=-Y'LX, A=-Y'L,X, B=Y'V, C=WX, 


is a descriptor realization of an (approximate) interpolant of the data with McMillan 
degree r = rank L. 


Remark 6.4. 

(a) The Loewner approach constructs a descriptor representation (W, L, L,, V), of an 
underlying dynamical system exclusively from the data, with no further manipu- 
lations involved (i. e., matrix factorizations or inversions). In general, the pencil 
(L, L) is singular and needs to be projected to a regular pencil (A, E). However, as 
shown in the mass-spring—damper example in equation (6.22), inversion can be 
replaced by generalized inversion. 

(b) As already mentioned, in the Loewner framework, by construction, D terms are 
absorbed in the other matrices of the realization. Extracting the D term involves 
an eigenvalue decomposition of (IL,, L). 


6.2.5.3 Interpolation property of reduced systems 


Given a Loewner quadruple and the projection matrices’ X, Y € C*, let the reduced 
quantities be 


L=X*Ly, L,=X*LY, V=X"V, W=wy. 


We also consider the projected L and R matrices, namely L = X*L, R = RY. The ques- 
tion which arises is whether these reduced quantities satisfy interpolation conditions 
as well. The answer is affirmative and to show this we proceed as follows. 

The associated A and M must satisfy the projected equations resulting from (6.18), 
i.e. 


i,-LA=WR and Î,-ÑÎ =W. (6.20) 
Notice that the projected Loewner pencil is not in Loewner form. To achieve this we 


proceed as follows. We need to diagonalize A and M. For this purpose we compute the 
following two generalized eigenvalue decompositions: 


[Di Ti] =eigti,-VR,L) and [Dý Ty] = eigit, - LW, Ê). 


3 We call X, Y € c™* projection matrices as they are used for defining the projector: X(Y*X) 'Y*. 
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These decompositions imply 
A=T,D,T; and M=TyDyTy. (6.21) 


where for simplicity, it is assumed that the matrices A and M are diagonalizable. 

It follows that the (diagonal) entries of D, and Dy are the right frequencies and 
the left frequencies of the reduced system, respectively. Furthermore, straightforward 
calculations imply that the remaining quantities are as follows: 


L,=Tgl.Ty L=TPLT,, 
ees, si ena 
V=T,V, L = Tgl 
W=WT;,, R=RT;. 


Conclusion: the right/left data triples for the reduced system are (Da W,R), and 
(Dx Ÿ. L), respectively, while the associated Loewner pencil is (L,, L). 


6.2.5.4 Real interpolants and reduced models 


Most often the data are collected from real systems. In these cases if (s;, hi) Si Qi € C, is 
a measurement pair, in order for the interpolants/reduced models to be real, the com- 
plex conjugate pair (5;,@;), should also be included. Thus the left/right frequencies 
besides real quantities contain complex ones appearing in complex conjugate pairs. 
For instance, in the SISO (single-input single-output) case, let the real measurement 
frequencies be g; € IR, and the complex ones 6; + j - @; where j denotes the imaginary 
unit. We split them in two sets, the left and the right ones, respectively, making sure 
that each set is closed under complex conjugation: 


M= {o;i = 1,...5756;4j-@p,i=1,...,13} 


A= {o,i=71,4+1...5% +136; 4+j-Opi=134+1,...,73 +r}. 


Thus the left set has r, real frequencies and r; complex frequencies together with their 
complex conjugates (total r; + 2r; numbers). Similarly the numbers for the right set are 
r and r,, i. e., it consists of r, + 2r, numbers. The quantities W and Y are assembled 
in accordance with M and A. In addition let us define the matrices: 


r; terms 
enn 


J, = blkdiag{I,,J,...,J] € e CUDU) 
h= = blkdiag[ [I,, 3 Ji. J e Cl2t2") X (T2424) 


Ta zl 


where J = F [i 7 ], where blkdiag[-] (following Matlab notation) denotes the block 


diagonal structure. A simple calculation shows then that the matrices 


Mp = JMJ > Ve = LV, Lp = JL, 
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have real entries. The same happens with the matrices 
Ap =J,M,j, Wp=WJ,, Rp =RJ;. 
Recall equations (6.18). If we now solve the transformed equations for L*, LÈ: 
L? - L°Ag = VgRg and LË -MgL =LRWp 


the resulting pencil (Lk, L?) has real entries. Hence the algorithms based on L? and 
LÈ described below yield real reduced order models. 


6.2.5.5 The Loewner algorithms for scalar and matrix rational approximation 


Next, two algorithms (see Algorithms 6.1 and 6.2) for computing a strictly rational real 
interpolant for both, the scalar and the matrix interpolation problem are presented. 


Algorithm 6.1: Loewner-SISO (Scalar rational approximation) [49]. 


Input: S = [s),...,sy] € CY, F = [Q1 ... Øn] € CX, NEN. 
Output: Ê € R™,Ac R™, Be R™,Ce R™ withr « N. 


1. Partition the measurements into two disjoint sets and form left and right set as 
(up Vj),j =1,...,q and (Aj, w;), i= 1,...,k. 


frequencies:  [s}... Su] > [Ap...sAkh [Mi .- -Hq k+q=N, 
values: [@,,...,dy] > [W,,..., Wy] = W, [Vi Vg] = V7. 


2. Construct the Loewner pencil as 


j=1,...k aes 
v- wy HiVi — AW; 
L — > L; = DO E . 
Hi - Aj Jixt,...g M-A Jiz 


3. It follows that the complex raw model is 


{W, L, L,, V}. 
4. Transform all the complex data to real and there follows the raw real model: 
[Wp L", LE, Val. 


5. Compute the rank-revealing SVDs: [Y,,2,, X,] = SVD([L°L]) and [Y,, 25, X2] = 
SVD([LË; 1%); the decay of the singular values, leads to the choice of the order r 
of the approximant. 
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6. The reduced real model is obtained by projecting the raw real model with Y = YP” 
and X = X5” as 


{Wp L", LÈ, Vp} = [WRX, Y L"X, YLEX, Y Vp} = {Ĉ, -Ê, -A, Ê}. 


SVD 


singular regular 


7. A real approximant of the data is then 


H(s) = C(sE - A) 'B = G(s). 


Algorithm 6.2: Loewner-MIMO (Matrix rational approximation). 


Input: S = [s),...,5y] € CY, F = [,,.... by] €CYP"",N EN. 
Output: E € R™,Ac¢ R°”, B e R™" Ce RY withr « N. 


1. Partition the measurements into two disjoint sets: 


Left data: 
T 
My £i vi 
M= a eC L=| 2°) eC ove : | ecr™, 
T T 
Hq tg V4 
Right data: 
A, mxk 
i «chk R=[r,, r, rec, 
i O W=[w; w wy] € CP*K, 
k 
2. Construct the Loewner pencil as 
T Tiy., j=Leok T T j=1,....k 
as). L -(* w 
W-Ay Jing S Hi- Àj FL 


3. It follows that the complex raw model is 
{W, L, L,, V}. 
4. Transform all the complex data to real and there follows the real raw model: 
[Wp L?, LË, Val. 


5. Compute the rank-revealing SVDs: [Y,, £, X1] = SVD([L°L]) and [Y,, 25, X2] = 
SVD([LË; LÈD; the decay of the singular values, lead to the choice ofr. 
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6. The reduced real model is obtained by projecting the raw real model with Y = YP” 
and X = X5”. 


{Wp L", LÈ, Vp} => {WRX, Y L"X, YLEX, YV p} = Ô, -Ê, -Â, Ê}. 


SVD 


singular regular 


7. A real approximant of the data is 


H(s) = Ĉ(sÊ - Â)Ê = p(s). 


6.2.6 Examples 


In this section the theory will be illustrated by means of simple examples. 


Example 6.1 (A spring-mass-damper system). Let m, d, and k denote the mass, damp- 
ing, and stiffness of the spring as in Figure 6.1; let also x(t) denote the displacement 
and F(t) the force applied; the associated differential equation is 


mx(t) + dx(t) + kx(t) = F(t). 


Figure 6.1: A spring—-mass—damper system. 


This is a SISO (single-input single-output) system. By introducing the state variables 
X = X,X) = X, the input u = F, and as output the velocity y = x, the following state 
equations result: 


X(t) = x(t), mo(t) = -kx - dxx(t) + u(t), y(t) = x(t). 


The system matrices are thus 


al ak Rel z pe: c=[0 1], 


and the resulting transfer function is 


S 


H(s) = C(sE - A) B = —_—_—.. 
6) SE ) ms? + ds +k 


In the sequel we will assume for simplicity that all parameters have value one. We 
now wish to recover state equations equivalent to the ones above from measurements 
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of the transfer function. Toward this goal we evaluate the transfer function at the real 
frequencies: A, = j, A, = 1 (right frequencies), as well as py, = -4, Į = -1 (left frequen- 
cies). The corresponding values of H are collected in the matrices 


w=(4 3), VW=(-% 1). 


Furthermore, with R = [1 1] = LĪ, we construct the Loewner pencil: 


-| 


Since the pencil (L,, L) is regular, we recover the original transfer function: 


N 
MID NS 
WIN WIN 
(e) 
n 
Il 
| ae | 
| | 
[sis RIS 
a) 
Ie 
(m) 


Z s 
H(s) = WỌ(s) ly = pe where Ms) = L, - sL. 
s4+s+1 
Hence, the measurements above yield a minimal (descriptor) realization of the system 
in terms of the (state) variables z4, z,: 


20. 2; 4 2 
F Z,(t) + 3220) =i Z,(t) + zu), 
6. 2. 4 1 2 1 
7410 + 3420) = 7240) 3720 + u(t), y(t) = 521 (t) + 3720), 
with 
20 2 -4 9 2 
E 21 3 A 21 D 3 a 2 1 
-|7 a a-| 3 a a C=[5 5] 
7 3 7 3 


By multiplying with E“!, it yields (id: identified system in state-space form) 
i 4 3 ñ -7 5 2 j 
Aa=| $ al Ba =| ie Ca=[5 3]. 


Coordinate transformation Let the state vector x be transformed to the new state 
vector z by the non-singular transformation matrix 


Y- | 7 I; Sa | 
CA CigAia 
of dimension 2 x 2. Then the following hold: 
z=W'x, A,=-WV'AY, B,=¥''B, C,, = CY; 


e.g., PAG = [9 1)=A. 
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Remark 6.5. The above result ensures that the Loewner framework constitutes a data- 
driven system identification method which constructs a realization only from mea- 
surements. It is important to mention that under a coordinate transformation, both 
systems, initial and identified are identical. At the same time, the underlying dynam- 
ics is recovered exactly while the corresponding revealing transfer function remains 
invariant under such a transformation. 


The question now arises: what happens if we collect more data than neces- 
sary? Let us consider 


A=diag( 5 1 4 2), M=diag(-5 -1 -3 -2). 


In this case, the associated measurements are 


_(2 1 6 2 z 2 6 2 5T 
wala Gog aye Isaa e 3) 
and with R = [1 1 1 1] = L’, the Loewner pencil is 
2002 | 28 8 -4 6 A 2 
nam 3 | 357 A 21 57 21 
6 2]; 10 3 4 1 4 1 
7 3 | 9 7 7 3 19 7 
L= , L= 
4 10 |) 52 16 4 8 36 10 
7 | Bs 49 7 21 133 49 
8 1] we 5 10 1 14 4 
2 3/57 A 21 3 57 21 


It turns out that we can choose arbitrary matrices X,Y e€ IR***, provided that 
det(Y’X) #0, e.g. 


-1 0 
O -1 O 1 O -1 
cee ead 
o o0 1 -1 -1 1 
-2 1 
so that the projected quantities 
6 -1 
Na z 6 1 ry oe 7 7 
W = WX [ 7 x | L=YLX= 18 1 | 
49 «147 
0 1 = 
=~ _wT E 21 C xp 3 
nevra| 9 A J e-vv-[ 2]. 
49 147 21 
constitute a minimal realization of H(s): 
Pee PENE ES S 
H(s) = W(L, - sL) Y = =. 
se+S+1 


It should be stressed that this holds for arbitrary projection matrices X, Y. 
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6.2.6.1 The generalized inverse approach 


There is another way to express the above relationship avoiding arbitrary projectors. 

Basic ingredients are generalized inverses. This approach will be demonstrated only 

for the spring-mass—damper example. However, it holds in general (see, e. g., [3]). 
In the sequel, we will make use (only) of the Moore-Penrose generalized inverse, 

which is defined as follows. Given the (rectangular) matrix M e€ RM, the Moore- 

Penrose generalized inverse, denoted by M™? e R4, satisfies: 

(a) MM“?M = M, 

(b) MY? mm? = MMP. 

(© [MM™?]7 = MMP, 

(d) [MPM]" = MM. 


This generalized inverse always exists and is unique. 
In the sequel we will be concerned with rectangular q x k polynomial matrices 
which have an explicit (rank-revealing) factorization as follows: 


M = XAy’, 


where X, A, Y have dimension q x n, n x n, k xn, n < q, k, and all have full rank n. In 
such cases, the Moore-Penrose generalized inverse is 


MY? = yyy) A (XTK) XT. 


Mass-spring-damper example (continued). The quantity needed is the generalized 


inverse of 

20s 4 2s 4 28s 2 8s 

2 2 3 57 57 2. A 

6s 4 2s 1 10s 4 3s 1 

= = Tor a 3°03 19 19 7 7 

P(s) = Ls - sL = 4s_ 4 10s _ 8 52s _ 36 16s _ 10 |’ (6.22) 

7 7 Qe 24 133 133 49 49 

8s _ 10 Sl 16s _ 14 os _ 4 

2° (21 3 3 57 57 IINR 


We first notice that ®(s) = XA(s)¥’, where X and Y can be chosen as follows: 


Fo Aal 
P g 


det(YX) + 0. 
wm 9 |- 
o1 ž ? 


RP No e O 
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Thus by taking the 2 x 2 upper-left block as A(s) = @(1 : 2,1 : 2)(s), it follows that 
(s) = OG a28), where 
—28(11610185s + 7274073) 14(3558666s - 5604037) 
Z(s) = 294(225182s + 281171) (-147)(192415s — 19668) 
3724(54617s + 48189) (—1862)(29046s — 17485) 
98(2527157s + 2123670) -—49(1250553s — 876439) 


6076(32301s — 391) 14(15168851s + 1670036) 
—2058(29494s + 15609) -147(417597s + 261503) 
—26068(5715s + 1523) —1862(83663s + 30704) 
—98(1797669s + 409322) -49(3777710s + 1247231) 


In the rectangular case, where there are two right measurements less, i.e., we only 
have A = diag[5, 1], while M remains the same, the right values are W = W(:,1: 2); 


hence 
20s _ 4 2s 
2 21 3 
_6s _ 4 2s _1 
oF = 77 3 3 = T 
®(s) = L, - SL = heta iane bias XA(s)Y , 
T7 2 A 
_8s _ 10 _s_1 
2 a 373 


has dimension 4 x 2, where Y = Y(1 : 2,1 : 2). In this case the Moore-Penrose inverse 
is 
= 1 
aoo 
(s) 737(s* +s +1) 
-4767s -3402 7s- 30875+294 3297s + BS 
5838s +5250 -1596s + 903 4326s — 1218 4515s — 1722 


which implies the desired equality 
=> WỌ(s)Py = Was)? Vv = H(s). 


Conclusion: the Loewner framework allows the definition of rectangular and singular 


systems. 


Example 6.2 (Reduction of a 10th order band-stop filter). The system has two inputs 
and two outputs (MIMO), state-space dimension 10, and a D term of rank 2. A state- 
space representation is as follows: 


x: x(t) =Ax(t)+Bu(t), y(t) =Cx(t)+Du(t), where 
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1 1 1 1 1 1 
-3 3 3 2 2/71 0 0 O Ọ 2 -3 
1 1 1 1 1 1 
-3 73 73 3 23/0 1 0 0 0 3 73 
1 1 1 1 1 1 1 
2 2 2 2 27/9 9 -1 0 0 2 3 
1 1 1 1 1 1 
-3 3 73 ge 72/0 0 0 -1 0 2 3 
1 1 1 1 1 1 
Pel iat a ee ae a 0 0 0 -1 HE E. 
1 0 0 0 0o 0o 0 0 0 0 0 | 
o 1 0 0 0/0 00 0 0 0 0 
o o 1 0 00 0 0 0 O0 0 0 
o 0o o 1 00 0 0 0 O 0 0 
o 0o 0o o 10 0 0 0 O0 0 0 
1 1 1 1 1 1 f 
-1 -1 1 1 1joo0000 kied 
= 2 92 2 2 9 |2 2 
Eee ce ae eee p-|7 | 
2 2 2 2 2 2 2 


The transfer function is a 2 x 2 rational matrix given by 


1 | m(s)  m(s) +D, where 
-n,(s) -n,(s) 
n,(s) = s(s? + 7s° + 13s" + 7s” +1), 
n,(s) = s(5s° + 6s’ + 25s° + 20s” + 41s‘ + 20s? + 255° + 6s +5), 
d(s) = 2(s* +s? + 3s” + 2s+)(2s° + 38° + 7s" + 7s? + 7s” + 3s + 2). 
It readily follows that lim H(s) = D. We take N = 100 samples of the transfer func- 


tion on the imaginary axis (frequency response measurements) between 10™ and 10! 
rad/sec. Figure 6.3 (left) shows the first 20 normalized singular values of the resulting 


S—00 


real Loewner pencil (the rest are numerically zero). The rank of L is 10 (the McMillan 
degree of the system) while the rank of L, is 12 (= rank L + rank D). The right pane in 
Figure 6.2 shows that we can obtain a perfect fit (total recovery of the model) with the 
Loewner framework for this MIMO example only by sampling the transfer function. 
As both Gramians are P = Q = ho i. e., they are equal and a multiple of the identity 
matrix, the Hankel singular values (see [2]) are all equal; this makes reduction with 
balanced truncation not feasible. 

The right pane in Figure 6.3 shows the poles of the system obtained by means of 
the Loewner framework along with the zeros for every entry. The right pane in Fig- 
ure 6.2 shows the band-stop character around frequency wọ = 1rad/s, of entries (1, 2) 
and (2,1). 
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RLC entry (1,1) RLC entry (2,1) 


Hy, with order 12 
) 


Hız with order 12 
0 


V 


10° 10! 10! 


rad/s 
H21 with order 12 


10 
1072 


Figure 6.2: Left pane: Shows the 100 measurements sampled with DNS(Direct Numerical Simula- 


0 0 =“ 
2 -50 kg 
—4 m we 
Oo a —100 i 
_8 —150 
—10 —200 
10-1 10° 10! Ox 10° 10! 
rad/s rad/s 
i RLC entry (1,1) i RLC entry (2,2) 
—50 —10 
M —100 A —20 
e) 
Riad —30 
— 200! —40 
1071 10° 10! oo! 10° 10! 
rad/s rad/s 
tions) of the theoretical (2 x 2 


The Loewner singular value decay with D 4 0 


oilo 


-matrix transfer function. Right pane: Loewner approximants. 


Pole / Zero Diagram 


imaginary 


18 -16 -14 -1.2 1 


0.8 —0.6 —0.4 -0.2 0 
real 


0.2 


Figure 6.3: Left pane: Shows the first 12 singular values while the rest are numerically zero. Right 


pane: Pole/Zero diagram. 


Computing the poles of the Loewner model confirms the accuracy of the approach. 


Consider the following: 


eig(A) 


eig(A,, E,) 


—0.0181885913675508 — 0.745231200229 i 
—0.0181885913675508 + 0.745231200229 i 
—0.148402943598342 — 0.632502179219046 i 
—0.148402943598342 + 0.632502179219046 i 
—0.699080475814867 — 0.715042997542469 i 
—0.699080475814867 + 0.715042997542469 i 
—0.0327309328175858 — 1.34106659803138 i 
—0.0327309328175858 + 1.34106659803138 i 
—0.351597056401658 — 1.49852758300335i 
—0.351597056401658 + 1.49852758300335 i 


—0.0181885913675508 — 0.745231200229 i 
—0.0181885913675508 + 0.745231200229 i 
—0.148402943598342 — 0.632502179219046 i 
—0.148402943598342 + 0.632502179219046 i 
—0.699080475814867 — 0.715042997542469 i 
—0.699080475814867 + 0.715042997542469 i 
—0.0327309328175858 — 1.34106659803138 i 
—0.0327309328175858 + 1.34106659803138 i 
—0.351597056401658 — 1.49852758300335i 
—0.351597056401658 + 1.49852758300335i 
(oe) 

o0 


As can be observed from this table, the Loewner method computes, besides the fi- 
nite poles, two poles at infinity. This happens because in the Loewner framework the 
D-term is incorporated in the remaining matrices of the realization. 


202 — D.S. Karachalios et al. 


6.2.7 Summary 


Recall Section 6.2.5. The following result summarizes the cases which arise in the 
Loewner framework, depending on the amount of data available. 


Lemma 6.2. Given is a scalar transfer function of McMillan degree n. 

1. Amount of data less that 2n. For q = k < n, define the transfer function H(s) = C(sE- 
A) 'B, by means of the Loewner procedure. The interpolation conditions below are 
satisfied: 


H(u;) =H(u;) and H(A;) =HA;) fori=1,...,k. 


If k = q = n, the Loewner quadruple is equivalent to the original one (C, E, A, B). 

2. Arbitrary amount of data, no reduction. For arbitrary k and q (i.e. k, gq < nork, 
q = n) the Loewner quadruple interpolates the data, even if the pencil (IL,, L) is 
singular. This is to be interpreted as follows: 


(L;-AjL)e;= V and ej (IL; — pL) = W. 


Hence We; = w; i = 1,...,k, and efv = V; j = 1,...,q. Therefore the transfer 
function of the Loewner pencil interpolates H(s) at the left and right interpolation 
points. 

3. Arbitrary amount of data, followed by reduction. If k,q = n, consider the rank- 
revealing SVD decompositions: 


[L 1] =¥,2,x! and | z | = Y,3,X7, 
s 
where Y, € R®", X, € R®Y, andr < k,q, is the exact or the numerical rank of the 
Loewner pencil involved. Let 
E=Y/LX,, A=Y/L,X,ecC™, B=Y;Vec’, Č=WX, e C™. 
Then the following approximate interpolation conditions are satisfied: 
Åu) = Hu), i=1....qg, and ÑA)=HA), j=1,...,k. 


In addition, the reduced system satisfies (exact) interpolation conditions as shown 
in Section 6.2.5.3. 


6.3 Practical considerations 


This section deals with some key aspects of the Loewner framework, through a prac- 
tical point of view. 
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— The factorization of the Loewner matrix into low-rank factor matrices: 
1. Through the singular value decomposition (SVD). 
2. Through the CUR decomposition. 
— The choices involving the measurements used in the framework: 
1. Distribution of the interpolation points. 
2. Partition of the interpolation points. 


6.3.1 The singular value decomposition 


The SVD is arguably one of the most useful and commonly used tools in numerical 
linear algebra. It is listed as one of the main matrix decompositions and can be effi- 
ciently computed through various numerically stable algorithms. It is widely used for 
different high dimension reduction and approximation methods. 

Any complex-valued matrix A € C™™ has a singular value decomposition given 
by A = YIX* where Y € C™", X e C™™ are unitary matrices, i.e., Y*Y = I, and 
X*X = Iņ. The left and right singular vectors are collected as columns of matrices X, 
and Y, respectively. 

Additionally, the matrix £ € C™” is defined as X; ; = o; and zero elsewhere. Here, 
the ordered non-negative scalars 0, > 0) > ---0, > O are the singular values (for 
n<m). 

In what follows, it is assumed that matrix A has low rank, i.e., rank(A) = r < n 
< m. Let k be a positive integer so that k < r. The singular value decomposition of 
matrix A can be additively split as follows: 


A=Y--X* = (Y; Ya ( Er Okm-k )-( Xi ) (6.23) 


PV On-kk En-k,m-k mek 
nxm mxm 
= YX k +Y pn Zn tm kt Xn-k (6.24) 


=A, 


where Yy € C™*, £y € CM and Xy € R”*. Note that the matrix A; := ¥,2,Xj € c”™" 
can be written in terms of the truncated dyadic decomposition, i. e., A, = re Oyi Xi, 
where y; and x; are the ith column of matrices Y, and X, respectively. 

A problem of interest is to approximate the original matrix A with a rank k ma- 
trix T, so that the approximation error is minimal with respect to the 2-induced norm 
or to the Frobenius norm. 

From the Schmidt-Eckart-Young-Mirsky theorem (see Theorem 3.6 in [2]), it fol- 
lows that (given 0; > Og41) 

acne tinal ~The = Ora on 
Moreover, it turns out that the unique solution to the minimization problem in (6.25) is 
given by T = A,. If we replace the 2-induced norm with the Frobenius norm, it follows 
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that 


n 2 
i A -Tip = ae 6.26 
ge ema li ( 3 ot) ( ) 


i=k+1 


As before, the unique solution to the minimization problem in (6.26) is again given 
by T = A,. For more details on the singular value decomposition (SVD) we refer the 
reader to [2], pages 31-41. 

The advantage of the SVD is that it offers optimal low-rank solutions in both the 
2-induced and the Frobenius norms. At the same time, one disadvantage is given by 
the fact that the method (full SVD) has cubic complexity with respect to min(m, n) (in 
the classical set-up when applied to dense matrices). Taking into account this possible 
downside, we seek ways of circumventing the usage of the classical SVD and investi- 
gate other matrix decompositions. It is worth mentioning that SVD complexity can be 
faster than cubic for a low-rank approximation with iterative algorithm. In the latter, 
a randomized version of SVD (r-SVD) will reveal this robust behavior. 


6.3.2 The CUR decomposition 


A challenging aspect of data-driven approximation methods is the choice of a rele- 
vant and meaningful low dimensional subset of the (usually large-scale) data set. In 
some cases, this subset can be used to preserve relevant characteristics of the dynam- 
ics for the model described by the original data. In this framework, it is of interest to 
devise procedures that can extract relevant information from large-scale sets of mea- 
surements. The end goal is to construct reduced order models suitable for tasks such 
as control, design, and simulation. 

Nowadays, the dimension of data sets for various applications can easily reach 
=~ O(10°). In such cases, computing the SVD of large and full matrices becomes pro- 
hibitive. 

One appealing alternative is the so-called CUR decomposition. As before, the goal 
is to approximate the original matrix A € C””, by a product of three low-rank matrices 
A = CUR. Here, the columns of the matrix C € C° are represent a subset of the 
columns of A while the rows of the matrix R € C” form a subset of the rows of A. 
Finally, the matrix U € C®” is constructed such that the factorization A = CUR holds. 

In this new set-up, the left and right singular vectors appearing in the SVD are 
replaced by columns and rows of the initial matrix A. Hence, the CUR factorization 
provides a way of identifying important sets of rows and columns of a matrix A. 

For more details on the CUR decomposition and some of its applications, we refer 
the reader to [19, 26, 27, 28, 47, 51, 54, 63]. 

The CUR factorization is hence an important tool for analyzing large-scale data 
sets which offers the following advantages over SVD: 
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1. If the matrix A is sparse, then the matrices C and R are also sparse (unlike the 
matrices X and Y in the SVD approach). 

2. The CUR factorization computes an approximation of A in terms of some of the 
rows and some of the columns of A. In contrast, the SVD computes approximants 
in terms of linear combinations of orthonormal bases generated by the rows and 
columns of A. 

3. Consider A € R", m > n. The complexity for computing the full SVD of A is O(n?) 
flops, using for instance the QR factorization, O(mn7) flops, using iterative meth- 
ods as in ARPACK, and O((m + n)k) flops per iteration, for approximate incremen- 
tal methods where the k dominant singular triples are determined approximately 
(for details see [12]). On the other hand the CUR factorization of order k requires 
O(k? + k?(m + n)) flops per iteration (for details see [47]). 


6.3.2.1 CUR approximation of the Loewner matrix 


In this section, we apply the CUR factorization to the Loewner matrix. We follow [47], 
where CUR is applied to Hankel matrices instead. 


Definition 6.3. With L € R™”, let Z = {i,,...,i,} and J = {j,,...,j,} denote the r- 
subsets (r « n) of row and column indices, respectively. If (-)MP denotes the pseudo 
inverse, then the CUR factorization of the Loewner matrix L is given by 


L, := LGJ) Lo) LC, :). (6.27) 
J -columns T-rows 


In practical applications, large-scale data matrices are only approximately of low 
rank (when data can be for instance corrupted by noise). In this case, the sets Z and 
J need to be chosen in such a way that the approximation error ||L - L,|| is small. 
Many approaches for selecting the sets of rows and columns have been proposed. In 
the following we mention only some of them. 

1. Selection based on a maximum volume sub-matrix in [54]. 
2. Selection based on minimizing the approximation error in the Chebyshev norm 

(“skeleton” approximation) in [26, 27]. 

3. Procedure based on the “cross-approximation” algorithm in [55]. 
4. Selection based on a discrete empirical interpolation method (DEIM) approach in 

[63]. 


6.3.2.2 The Loewner CUR algorithm 


We introduce a data-driven approximation algorithm for the SISO case based on CUR 
approach. This constructs a reduced order model by means of an adaptive selection of 
the rows and columns via the cross-approximation algorithm in [55]. The steps of the 
procedure are included in Algorithm 6.3. 
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Algorithm 6.3: Loewner CUR-cross-approximation based — SISO [44]. 


Input: S = [s,,...,Sy] € Cc’, F = [hi ... Oyn] € C™ with N,r € N, and tolerance 
values 6, €. 
Output: Ec R™,Ae R™, Be R™,Ce R™ withr «N. 


Form the left and right sets as (Hp vj) j =1,...,q and (A;,w;),i=1...,k 
Form the Loewner matrices L and L, as in Algorithm 6.1 and step 2. 
Transform all the complex data to real as explained in Section 6.2.5.4. 
Jo = lji.. --jr] © Jn an initial set of column indices. 

[Z ~, ~] = crossapprox([L Lg], Jo, ô, €). 

[~, Jp ~] = crossapprox([ 1; l, T, ô, €). 

Ê = -L(Z,,.7,), Â = -L,(Z,, J,), B = V(Z,), € = W(F,). 

The rational approximant is given by 
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H,(s) = C(sE - A) 'B. 


For the practical implementation of the function “crossapprox”, used in steps 5 and 
6 of the above algorithm, we refer the reader to Algorithm 1 in [47], or to the original 
reference [55]. 


Remark 6.6. Instead of using the cross-approximation algorithm, one can use the 
DEIM (Discrete Empirical Interpolation Method) algorithm from [63]. Hence, steps 5 
and 6 in Algorithm 6.3 need to be modified accordingly. As a result, singular value 
decompositions are performed in order to construct left and right singular vector 
matrices (for which the DEIM procedure is applied to). In order to avoid the SVD, an 
incremental QR factorization can be instead used, as proposed in [63]. 


Remark 6.7. The CUR factorization directly reveals the dominant rows/columns of the 
data, while the SVD does not. More precisely, the leading singular vectors give only 
linear combinations of the underlying features. Whereas, with the CUR one gets an 
actual subset of the initial features (columns) together with the corresponding rows. 
Consequently, a first benefit of the CUR is that it preserves the physical meaning and 
structure of the initial data. Additionally, another advantage is that the sparsity is pre- 
served. 


6.3.3 Choice of left and right interpolation points 


This section deals with the problem of selecting the initial interpolation points in the 
Loewner framework. More specifically, we investigate how the choice of the initial in- 
terpolation points affects the quality of the reduced order model. We take into consid- 
eration different point distributions in 1D or in 2D. 
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Moreover, several splitting techniques are analyzed. These are related to the par- 
tition of the data set into two disjoint subsets, which is performed in the beginning of 
the algorithms in the Loewner framework. 


6.3.3.1 Distribution of the interpolation points 


We present various distributions of the initial interpolation points for the one-dimen- 
sional case (1D) as well as for the two-dimensional case (2D). 


1D interpolation grid 2D interpolation grid 
equispaced; equispaced (“same areas”); 
logarithmic spaced; logarithmic spaced; 
Chebyshev nodes; Padua points; 

Uniformly random Uniformly random 


In Figure 6.4, we depict different distributions of initial interpolation points. One way 
of selecting points is that of equispaced or linearly spaced points, commonly used for 
Fourier analysis. This represents a natural choice because of the usage of trigonomet- 
ric periodic functions. 


* Linear 
* Logarithmic 
+ Chebyshev 
= Random uniformly} 


arithmic structured 


1D grid 


10 
x 
Uniformly Distributed 


oy 


15 20 


14 16 18 20 


Figure 6.4: A visual representation of different interpolation grids. 


In some practical applications, under the assumption that the energy decreases ex- 
ponentially as time or frequency approach infinity (on an unbounded domain), the 
choice of logarithmic distributed points is more appropriate. 

Naturally, a dense sampling grid can be used in the beginning of the experiment 
(e. g., for a lower frequency range or for small time instances). The motivation for this 
approach stems from the assumption that the meaningful quantities (with high energy 
or with relevant oscillations) appear in the beginning, hence requiring more samples. 
Afterwards, a more sparse distribution grid of points can be instead chosen as the 
energy level decays (or as relevant oscillations decay in time). 
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Additionally, the choice of Chebyshev-type points is motivated by their usage in 
polynomial-based interpolation on bounded domains due to for example, the elimi- 
nation of the Runge phenomenon’ (high degree polynomials are generally unsuitable 
for interpolation with equispaced points). 

Finally, randomly distributed sampling points often appear in stochastic experi- 
ments that are characterized by randomness. 


6.3.3.2 Partition of the data points and values 


Data splitting is one of the first steps in the classical Loewner algorithm (presented 
in Section 6.2). In this section, we mention various splitting schemes and how they 
affect the Loewner matrix singular value decay and also the approximation quality of 
the Loewner interpolants. 

The data set (n = even) is composed of 


eae points : S = [w,,W>,...,@,] € R”, with w, < w3 < + < Wp (6.28) 


Sample values : H = [H(w,),H(@),...,H(w,)] € C”. 


We analyze four different types of data splitting that are mentioned in the following. 
1. First type: disjoint splitting. 

- H=[W4,...,Yy/2] and V = [H(@,),...,H(@yy2)], 

- A= [Wnr -> Un] and W = [H(Wy/241),.--, H(wn)]. 
2. Second type: alternate splitting. 

- H= [w], w,..., Wn] and V = [H (w1), H(w3),...,H(Wy_1)], 

- À= [w, W... Wn] and W = [H (w2), H(w,),...,H(w,)I. 

3. Third type: magnitude splitting (in this case the set S is first sorted with respect 
to the magnitude of the set H). 

- H= [W,...,Wy/2] and V = [H(w;),..., H(wn2)], 
- A= [Wn -> Un] and W = [H (wr) --- H(wn)]. 

4. Fourth type: magnitude alternate splitting (in this case, the set S is first sorted 
with respect to the magnitude of the set H and then alternating splitting is ap- 
plied). 

- p= [W,,W3,...,Wy,_,] and V = [H (w1), H(w3),...,H(Wy_1)], 
- À= [wW ..., Wn] and W = [H(w), H(w,),...,H(Wy)]- 


As observed in practice, when splitting the data as for the first type, the Loewner ma- 
trix has a very fast decay of the singular values. Moreover, in this case, the computed 
reduced models usually provide low approximation quality. 


4 Runge’s phenomenon is a problem of oscillation at the edges of an interval that occurs when us- 
ing polynomial interpolation with polynomials of high degree over a set of equispaced interpolation 
points. 
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On the other hand, for the second separation type (alternate splitting), the left 
and right sets of sample points can be chosen e-close to one another (element-wise). 
Hence, as €e — 0, Hermitian interpolation conditions are enforced (which involve 
matching the first derivative at those points). 

Other observations that hold in the case of second type splitting are that the nu- 
merical rank of the Loewner matrix is usually larger than that of the Loewner ma- 
trix constructed based on the first type. Additionally, for the second type, the condi- 
tion number is smaller than that computed for the first type. For the above-mentioned 
cases, bounds on the singular value decay of the Loewner matrix are provided in [15]. 


6.4 Case studies 


In this section we illustrate the concepts developed in the preceding sections by means 
of examples. In particular the following seven examples will be analyzed. 

The benchmark CD player (n = 120). 

The function f(x) = exp(—x) sin(10x), x € [-1, 1]. 

The inverse of the Bessel function of the first kind, in [0, 10] x [-1, 1]j. 

An Euler—Bernouli beam. 

A heat equation with transfer function H(s) = exp(- vys), s € [0.01, 100]j. 
Approximation of f = y/ sinh(y), y(x) = 1007(x? — 0.36), x € [-1, 1]. 

The sign function in the interval [—b, —a] and [a,b], a > b > 0. 
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6.4.1 The CD player 


Consider the CD player benchmark example which is a MIMO dynamical system of 
dimension 120 with 2 inputs and 2 outputs. Here we will consider the (2, 1) sub-system, 
i. e. the SISO system from the first input to the second output. 

The goal is to approximate the transfer function in the Loewner framework. We 
start by considering 400 interpolation points +jw;, i = 1,...,200, where w; are loga- 
rithmically spaced in the interval Q = [107',10°]. Thus Q = {w,,W>,...,W 99}, where 
Wi < W;,1, for all i. We now define the left/right interpolation points in four different 
ways as explained in section 6.3.3.2 and depicted in Figure 6.5 (up). 

As can be seen in Figure 6.5 (down), the decay of the Loewner matrix singular 
values is faster for “half-half” (disjoint) splitting than for “alternating” splitting. 

The next step is to choose the truncation order and to determine the level of ap- 
proximation. We propose two different ways for this purpose. 

1. By choosing equal truncation orders r. 
2. Bychoosing for each separation the maximum truncation order so that me is equal 
to a fixed tolerance value. 
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Figure 6.5: The four different splitting schemes (up) and the decay of the singular values Gi = 
1,...,100) of the Loewner matrix for each type (down). 


First experiment: equal truncation orders 
Here, we fix the truncation order to r = 10, and compute a The results are presented 


in Table 6.2. 


Table 6.2: Normalized singular values corresponding to r = 10 for each splitting. 


Case 1st 2nd 3rd 4th 


r 10 10 10 10 
or 1e-8 1e-6 1e-4 1e- 4 


The frequency response of the original system with those of the four reduced systems 
(corresponding to each different splitting) is shown in Figure 6.6. Note that all methods 
produce similar approximation quality. 


Approximation with Half-Half so Approximation for Alternating 


5 0 5 
rad/s 10 10 rad/s 10 


Figure 6.6: Frequency response comparison: original system vs. the reduced ones with equal trunca- 
tion orders (r = 10). 
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Next, the approximation error for each reduced systems is depicted in Figure 6.7. For 
the first partition type, the error curve displays a ‘V’ shape form near the middle of 
the sampling interval. This is where the left and right sampling points are very close 
to each other. 


Error Plots - CD player Input[1] -> Output[2] 


10" 10° 10 10? 10° 104 10° 
rad/s 


Figure 6.7: Approximation error with the four splitting schemes. 


Second experiment: reaching machine precision” 
The tolerance of normalized singular value a is now fixed (e. g. 1071“). This implies 
the truncation order r. The results are presented in Table 6.3 


Table 6.3: Different truncation orders for all splitting schemes and for a fixed tolerance. 


Case 1st 2nd 3rd 4th 


r 16 51 23 48 


a1 


or le-14 1le-14 1le-14 1e-14 


The truncation order for the first splitting type is more than three times smaller than 
that for the second splitting type (16 vs 51). 

The frequency response of the original systems with the four reduced systems in 
depicted in Figure 6.8. All methods produce good approximation quality, with a slight 
deviation in the high frequency range observed for the first splitting type. 

Finally, Figure 6.9 shows the approximation error for each reduced system. 

Notice that the blue curve in Figure 6.9 has a ‘V’ shape in the middle of the sam- 
pling interval. The lowest approximation error is recorded for the second splitting type 
(alternate selection). 


5 Machine precision is the smallest number e such that the difference between 1 and 1 + € is nonzero. 
This is approximately 107°, 
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Figure 6.8: Frequency response comparison: original system vs. the reduced ones by reaching ma- 
chine precision. 


Error Plots - CD player Input[1] -> Output[2] 


Figure 6.9: Approximation error for the four splitting schemes. 


6.4.2 Approximation of an oscillating function 
We collect N = 4,000 measurements {(s;,, @,) : k = 1,...,N} of the following function: 
@(x) = e™ sin(10x), x € [-1,1]. (6.29) 


Assume that the interpolation points s = [5,,5),...;S4999] ¢ [-1,1] are equispaced; 
next we remain with two types of splitting. 
1. First type: disjoint splitting. 

- Left: u= [S4, S2... , S2000] € [-1, 0) 

— Right: À = [S2901 S2002 - - -> S4000] € [0,1] 


We construct the Loewner pencil and the underlying rank is 11. 
2. Second type: alternate splitting. 

— Left: u = [51,53,..., S3999] € [-1, 1] 

- Right: À = [S3, S4.. -> S4000] € [-1, 1]. 


We construct the Loewner pencil and the underlying rank is 15. 
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Figure 6.10 shows the entries of the Loewner matrix in logarithmic scale for the 
two ways of sampling point separation. Next, the interpolation data is compressed, 
making use of the following methods: (a) the singular value decomposition SVD, (b) 
the randomized version rSVD, (c) CUR, implemented with DEIM and (d) CUR imple- 
mented with cross approximation. The parameters for the latter two methods are: 
€ = 0.001 and 6 = 0.01. 


Log-Scale Loewner Matrix Log-Scale Loewner Matrix 


= a -15 
200 400 600 800 1000 1200 1400 1600 1800 2000 500 1000 1500 2000 
pi à 


Figure 6.10: Entries of the Loewner matrix for the first splitting (left) and the second splitting (right). 


Table 6.4: Results for the first splitting type (disjoint) with an i5-CPU 2.60 GHz. 


Reduction — r for = L rank(L,xr) cond(IL,,,) = Smax Error ||-llf Time (s) 
SVD 11 9.7313e +10 6.7367e - 10 4.166029 
CUR-CrossApprox 11 7.6582e +10 1.5621e- 09 0.528352 
CUR-DEIM 11 1.3898e + 11 2.2283e - 09 4.101303 


randomized SVD 11 9.7314e +10 1.1281e - 10 0.030148 


In Figure 6.11 the error curves for the first splitting are shown. The red Xs indicate the 
selected points with CUR-cross-approximation method while the green crosses (+) in- 
dicate the selected points with CUR-DEIM method. In Figure 6.12 the error curves for 
the second splitting are shown. As opposed to the previously shown results (in Fig- 
ure 6.11), the error in this case (Figure 6.12) is distributed more uniformly. Additional 
qualitative measures (e.g., the condition number) under different splitting schemes 
with the same reduced-order are presented in Tables 6.4 and 6.5. 


Table 6.5: Results for the second splitting type (alternate) with an i5-CPU 2.60 GHz. 


Reduction — r for = L rank(L,,;) cond(L,,.-) = aut Error ||-llp Time (s) 
SVD 11 8.8199e+4 0.0020 4.261075 
CUR-CrossApprox 11 1.0228e+5 0.0062 0.563411 
CUR-DEIM 11 9.3343e+4 0.0245 4.152420 


randomized SVD 11 8.8199e+4 0.0020 0.024586 
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Figure 6.11: Selected points and approximation error for the disjoint splitting. 
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Figure 6.12: Selected points and approximation error for the alternate splitting. 


As seen in the above experiments, the splitting of the data influences both the Loewner 
singular value decay and the quality of approximation. In most of the experiments that 
follow, we choose the alternate way of splitting the data. 


6.4.3 Approximation of a Bessel function 


In this section we investigate the approximation of the inverse of a Bessel function in a 
domain in the complex plane. If this function is considered to be the transfer function 
of a dynamical system, this system is infinite dimensional; furthermore it is not stable 
as there are poles in the right-half of the complex plane. 

In particular we consider the inverse of the Bessel function of the first kind and 
order n € N. It is defined by the following contour integral: 


J,(S) = 5 eM Demat (6.30) 


Here, we consider only the case n = 0. Our aim is to approximate H(s) = ia „S € C, in- 


side the rectangle Q = [0,10] x [-1,1] c C. In Figure 6.13 (left pane) the function H (s) 
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Sampling points (Padua) 
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Figure 6.13: Left pane: The inverse of the Bessel function of the 1st kind. Right pane: A subset of 
10,000 Padua point grid over Q = [0,10] x [-1,1] domain are shown. 


is shown in the domain Q. The three spikes correspond to the unstable poles of the 
underlying system. These are three of the zeros of the Bessel function. Here we con- 
struct approximants H,(s), of order r, of H(s), using the interpolation points as shown 
in Figure 6.13 on the right pane. The distribution of the two-dimensional initial grids 
is 5,000 Padua points with the conjugates. This grid is used to reduce the Runge phe- 
nomenon. For more details in approximation theory (i. e. Runge phenomenon, Padua 
points, barycentric interpolation, etc.), we refer the reader to [64]. In [43, 44], the same 
experiment with other types of grids (random uniformly, structured) is presented. 

In the Loewner framework, the singular value decomposition (SVD) plays a key 
role. This factorization allows us to extract the numerical order of the rational model 
which approximates the original non-rational one. 

In Figure 6.14 (left pane), we show the distribution of the normalized singular val- 

9j 
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Figure 6.14: Left pane: Singular value decay of 10,000 values. Right pane: Pole/zero diagram with 
the three original poles (zeros of Bessel) which recovered with 15 digits accuracy. 


By taking measurements as in Figure 6.13 (right pane) the decay of the singular values 
Figure 6.14 — left pane, leads to a reduced order r = 12 with 2 = 4.887 - 107”. In 


Figure 6.14 on the right pane the pole/zero diagram is presented which includes the 
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results from all methods. Methods VF, and Loewner(SVD or CUR) construct real strictly 
rational models with degree (11, 12)° with D = 0, as opposed to AAA algorithm which 
constructs complex proper rational model of degree (12, 12) with a non-zero D term. 
By using the methods LoewCUR-cross and AAA, points from the sampling grid 
are selected. Applying the LoewSVD method the point selection is obtained by com- 
pressing the initial grid. This can be achieved by using the first r columns (r: singu- 
lar vectors) of the singular matrices as projection matrices and by solving two (r x 
r-dimension) generalized eigenvalue problems as explained in Section 6.2.5.3. Under 
this way, we compress the original grid with N = 10,000 points into a much smaller 
set of only 2r = 24 points which are exact interpolation points for the approximant. As 
it turns out, the projected points lie in the domain Q; see also left pane in Figure 6.15. 


Various points in Q 


40°12 LoewSVD Error (11,12) <10! AAA Error (12,12) 


52 
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2o 
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imaginary 


Figure 6.15: Left pane: Support and compressed points for every method over Q domain with LSVD(r) 
— LoewSVD projected right points, LSVD(l) + LoewSVD projected left points. Right pane: The error 
for every method. 


The LoewCUR-cross and AAA methods select points among the initial interpolation 
points but with different criteria. The AAA algorithm selects support points by mini- 
mizing the mean squared error with the rest of the measurements while LoewCUR uses 
cross approximation, which maximizes the absolute value of the determinant (maxi- 
mum volume) of the sub-matrix of dimension (r x r). 

In Figure 6.15 on the right pane, the error for each method is shown. The normal- 
ized error is computed as FORO with 25,000 evaluation points in Q. It should be 
mentioned that the above special choice of the original interpolation grid as Padua 
points, indeed reduced the Runge phenomenon. 

Next we wish to visualize the approximation error outside Q. Towards this goal 
we chose 25,000 equispaced evaluation points inside the domain [-3, 13] x [-3, 3]. Re- 
sults with log-contour level error of increasing order 10716, ...,107™“ are presented in 
Figure 6.16. 


6 The notation (m, n) indicates that the order of the numerator polynomial is m and the order of de- 
nominator polynomial is n. 


6 The Loewner framework for system identification and reduction —— 217 


Loew-SVD and projected points. 


KENI 


Figure 6.16: Extrapolation error as log |H(s) — H, (s)| in [-3, 13] x [-3,3]j c C. The symbol ‘+’ is for the 
original poles. 


All methods constructed accurate rational approximants. Notice, however, that the 
Loewner approach reaches similar precision with AAA without performing any op- 
timization step. Finally, in terms of computational complexity, the CUR method per- 
formed the best. 


6.4.4 An Euler—Bernoulli beam 


In this subsection we analyse the approximation of an Euler—Bernoulli clamped beam 
[18]. The underlying PDE describes the oscillation of the free end. As shown in [18], the 
non-rational transfer function is given by 


S sn(s) 
a (EI + scqI)m3(s)d(s)’ ee 

fo -s F E (6.31) 
m(s) = Pend , d(s) =1+cosh(Lm(s)) cos(Lm(s)), 


n(s) = cosh(Lm(s)) sin(Lm(s)) — sinh(Lm(s)) cos(Lm(s)). 


Usually, the next step consists of a discretization of the PDE involved. We bypass this 
step and instead take frequency response measurements making use of the transfer 
function above. The parameter specification is as in [18].’ Thus, we have the frequency 
response of the beam as in Figure 6.17 and on the left pane. 


7 Young’s modulus (elasticity constant): E = 69 GPa = 6.9 - 10'° N/m’, moment of inertia: J = 3.58 - 
107? mí, damping constant: cg = 5- 1074, length: L = 0.7 m, base: b = 0.07 m, height: h = 0.0085 m. 
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Figure 6.17: Left pane: Original frequency response of the beam. Right pane: The approximant which 
constructed with the Loewner framework. 


The next step is to collect 2,000 measurements on the imaginary axis (frequencies 
jw;,i = 1,...,2000), spaced logarithmically from 1 rad/s to 10° rad/s. These points are 
depicted in the left pane of Figure 6.18. 
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Figure 6.18: Left pane: 2,000 sampling points alternating as left and right. Right pane: The singular 
value decay. 


The singular value of the Loewner matrices decay is as shown in Figure 6.18 on the 
right pane. Thus, we construct a reduced model with dimension r = 44 and the 
Loewner approximant in Figure 6.17 (right pane) is depicted. 

Finally, the poles and zeros for every method are presented in Figure 6.19. The 
quality of the approximation is given for each method in Figure 6.20 where the evalu- 
ation is in the frequency range from 1 to 10%°. The error outside the sampling domain 
increases thus indicating the difficulty of approximation outside of the sampling do- 
main for infinite dimensional systems. 


6.4.5 Heat equation 


Next, we investigate an one-dimensional heat equation [13]. The corresponding PDE 
describing the diffusion of heat leads to the following non-rational transfer function: 


H(s) = ae. sec. (6.32) 
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Figure 6.19: Pole/Zero diagram for every method (LoewSVD, LoewCUR-cross, LoewCUR-DEIM, VF and 
AAA). 
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Figure 6.20: The error distribution with 8,000 evaluation points grid. 


The aim is to construct reduced models by means of the Loewner framework and com- 
pare the results with the TF-IRKA used in [13]. Iterative Rational Krylov Algorithm - 
IRKA [14] builds optimal reduced models by minimizing the H, error [39]. 

By collecting 1,000 values of the transfer function on the imaginary axis, the re- 
a reduced order was chosen to be r = 6 (as in [13]). For this truncation order, 

~ 6 - 107°. In Figure 6.21c, the pole/zero distribution for every method is depicted; 

in Figure 6.21d, the selected points are shown. It is worth mentioning that the Loewner 
SVD method produced poles near to the optimal set computed by means of IRKA; see 
Figure 6.21c. Approximation results are in Figure 6.22. 


6.4.6 Approximation of a two-peak function 


In this section we present an example involving a hyperbolic sine from [22]. The diffi- 
culty here results from the two differentiable peaks. More precisely, the function is 


1007(x* — 0.36) 


? 1,1], 
sinh(10071(x2 — 0.36)) SARR 


fœ) = 
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Figure 6.21: Approximation of the heat equation with LoewSVD, LoewCUR, VF, AAA, TF-IRKA. 
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Figure 6.22: Approximation results for the heat equation with various interpolation methods. 


and is shown Figure 6.23 (left pane). We approximate this function by choosing 1,000 
equispaced points in [-1, 1] as on the right pane in Figure 6.23. The singular values of 
the Loewner matrix are shown in Figure 6.24 on the left pane while the selected points 
are shown on the right pane of the same figure. The order is selected to be r = 38 with 
(5 = 107”). In Figure 6.25, the distribution of the poles and zeros for each method is 
shown. On the other hand, AAA looks quite different because it does not impose real 
symmetry. 


Remark 6.8. In Figure 6.24, right pane, the different supports points are shown. In the 
case of the LoewSVD method two almost pure imaginary projected points are obtained 
even if the initial sampling points were real. 
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Figure 6.23: Left pane: The function f with two very sharp differentiable picks. Right pane: 1,000 
sampling points and zoom in close to one pick. 
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Figure 6.24: Left pane: Singular values decay. Right pane: Various points for every method and the 
projected points from the Loewner framework. 
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Figure 6.25: The pole/zero diagram. 


Finally we observe a good fit for every method, with slightly better performance at- 
tained for the Loewner SVD method (see the error plot in Figure 6.26). 


6.4.7 Approximation of the sign function 


Our final case study problem concerns the approximation of the sign function, known 
as Zolotarev’s fourth problem. Here, we compare the approximation obtained using 
the Loewner SVD with the optimal solution that is explicitly known [1]. Given two dis- 
joint closed complex sets E and F, Zolotarev’s fourth problem is to find the rational 
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Figure 6.26: The error profile with 5,000 evaluation points over [—1, 1]. 


function r(x) = n where p, q are polynomials of degree k, that deviates least from 
the sign function 


-1, xeE, 
+1, XEF, 


sign(x) = | 


on E u F. For general sets E and F, the solution to Zolotarev’s fourth problem is not 
known; however, there are special cases where the rational function can be given ex- 
plicitly. For the real disjoint intervals, E = [-b, -1], and F = [1, b] with b > 1, an explicit 
(optimal) solution to Zolotarev’s fourth problem is known [1]. Here, we investigated 
how well the Loewner framework can approximate this discontinuous function in two 
symmetric real intervals. We choose b = 3 and N = 2,000 initial interpolation points 
from [-3, —1] u [1,3]. We perform two experiments. Firstly, we choose initial interpo- 
lation points as equispaced and secondly, as Chebyshev nodes. For each choice, we 
split the data as “half-half” and “alternating” as discussed previously. The left pane 
in Figure 6.27 shows the plot of the sign function. 
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Figure 6.27: Left pane: The sign function. Right pane: 200 Chebyshev points in [-3, —1] u [1, 3]. 


In [17] the explicit solution of this optimization problem is computed. We start by tak- 
ing N = 2,000 measurements as Chebyshev nodes as in Figure 6.27 on the right pane. 
The above sampling way leads to the following singular value decay of the Loewner 
matrices as in Figure 6.28 on the left pane. 
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Figure 6.28: Left pane: The singular value decay of the Loewner pencil. Right pane: Pole/Zero dia- 
gram for the Loewner and the optimal approximant with order r = 4. 


From the rank-revealing factorization in the left pane in Figure 6.28, we chose r = 
4 with z = 1.657 - 10~™*. In Figure 6.28 on the right pane is the distribution of the 
pole/zero diagram which is derived from the Loewner SVD method, in comparison 
with the optimal set is presented. 

In Figure 6.29 (left) the Loewner approximant is shown. It is quite close to the op- 
timal one by choosing the Chebyshev nodes and splitting the left and right points as 
“half-half”. Indeed, the error distribution as presented in the optimal interpolant in 
Figure 6.30 with the blue line has the equioscillation property of the optimal approx- 
imant in the infinity norm - ||x||,, = max(|Xx,|,..., |x,,|). Thus the equioscillation of the 
error | sign(x) — r(x)| on both intervals shows the optimality of the approximant. The 
Loewner framework succeeds in constructing an approximant very close to the opti- 
mal. Another aspect is shown in Figure 6.29 (right pane). More specifically, note that 
the projected points are indeed interpolation points. 


Loewner SVD approximant (r-1,r) - Order =4 Loewner SVD approximant (r-1,r) - Order =4 


@ Loewner right projected 
e Loewner left projected 


f(x) 


Figure 6.29: Left pane: A comparison between the Loewner approximant with the optimal one order 
r = 4, Right pane: The projected points are approximated interpolation points. 


Remark 6.9. If the choice of the splitting is disjoint—“half—half” as in this experi- 
ment, the constructed approximant interpolates the data as in Figure 6.29(right pane). 
If the choice is “alternating” by mixing left and right, then the projected low order 
model approximates the values and the derivatives at the interpolation points as in 
Figure 6.31. 
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x10°3 Error curves with order r=4 


Figure 6.30: Error plot with the Loewner approximant and the optimal solution as well with order 
r=4, 
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Figure 6.31: By splitting the data as “alternating”, the projected Loewner model approximates the 
first derivative as well (Hermite interpolation conditions). 


6.5 Epilogue 


Interpolatory methods for model identification and reduction were studied in this con- 
tribution. The main focus was on the Loewner framework. The aim was to introduce 
the Loewner framework by providing results which connect this rational interpolation 
tool with system theory. At the same time, algorithms that make the Loewner frame- 
work a complete numerical tool for approximation with ease of implementation are 
offered. Several case studies illustrate the effectiveness of the method. Implementa- 
tion issues like the splitting of the data in left and right were addressed. Finally, con- 
nections with the SVD, the r-SVD, CUR, VF and IRKA have been detailed. 
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Ralf Zimmermann 
7 Manifold interpolation 


Abstract: One approach to parametric and adaptive model reduction is via the inter- 
polation of orthogonal bases, subspaces or positive definite system matrices. In all 
these cases, the sampled inputs stem from matrix sets that feature a geometric struc- 
ture and thus form so-called matrix manifolds. This chapter reviews the numerical 
treatment of the most important matrix manifolds that arise in the context of model 
reduction. Moreover, the principal approaches to data interpolation and Taylor-like 
extrapolation on matrix manifolds are outlined and complemented by algorithms in 
pseudo-code. 


Keywords: parametric model reduction, matrix manifold, interpolation, Riemannian 
computing, Riemannian normal coordinates 
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7.1 Introduction & motivation 


This chapter addresses interpolation approaches for parametric model reduction. This 
includes techniques for 

— computing trajectories of parameterized subspaces, 

— computing trajectories of parameterized reduced orthogonal bases, 

—  structure-preserving interpolation. 


Mathematically, this requires data processing on nonlinear matrix manifolds. The ex- 
position at hand intends to be an introduction and a reference guide to numerical 
procedures with matrix manifold-valued data. As such it addresses practitioners and 
scientists new to the field. It covers the essentials of those matrix manifolds that arise 
most frequently in practical problems in model reduction. The main purpose is not 
to discuss concrete model reduction applications, but rather to provide the essential 
tools, building blocks and background theory to enable the reader to devise her/his 
own approaches for such applications. 

The text was designed such that it works as a commented formula collection, 
meanwhile giving sufficient context, explanations and, not least, precise references 
to enable the interested reader to immerse further in the topic. 
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7.1.1 Parametric model reduction via manifold interpolation: an 
introductory example 


The basic objective in model reduction is to emulate a large-scale dynamical system 
with very few degrees of freedom such that its input/output behavior is preserved 
as well as possible. While classical model reduction techniques aim at producing 
an accurate low-order approximation to the autonomous behavior of the original 
system, parametric model reduction (pMOR) tries to account for additional system 
parameters. If we look for instance at aircraft aerodynamics, an important task is to 
solve the unsteady Navier-Stokes equations at various flight conditions, which are, 
amongst others, specified by the altitude, the viscosity of the fluid (i. e. the Reynolds 
number) and the relative velocity (i.e. the Mach number). We explain the objec- 
tive of pMOR with the aid of a generic example in the context of proper orthogonal 
decomposition-based model reduction. Similar considerations apply to frequency 
domain approaches, Krylov subspace methods and balanced truncation, which are, 
e. g., discussed Chapters 2 and 3 of this volume and in [19, Chapter 1], [20, Chapter 3]. 
Consider a spatio-temporal dynamical system in semi-discrete form 


S(t) = FKL WH), xoh) = Xow (71) 


where x(t, u) € R” is the spatially discretized state vector of dimension n, the vector 
H= (Hi -.-- Hq) € IR? accounts for additional system parameters and f (; u) : R” > R” 
is the (possibly nonlinear, parameter-dependent) right hand side function. Projection- 
based MOR starts with constructing a suitable low-dimensional subspace that acts as 
a space of candidate solutions. 

Subspace construction. One way to construct the required projection subspace is 
the proper orthogonal decomposition (POD) see [19, Chapter 2]. In its simplest form, 
the POD can be summarized as follows. For a fixed system parameter u = Up, let 
x! := x(t, Ho), ---3X™" := X(tm Ho) € R” be a set of state vectors satisfying (7.1) and let 
S := (x4,...,x™) € R®™, The state vectors x! are called snapshots and the matrix $ is 
called the associated snapshot matrix. POD is concerned with finding a subspace V of 
dimension r < m represented by a column-orthogonal matrix V, € R™®” such that the 
error between the input snapshots and their orthogonal projection onto V = ran(V,) 
is minimized: 

min ) I“ - wx" |; (e min ||S- W'Sliz). 
VeR™ VTV- Ç VeR® VTV=I 
The main result of POD is that, for any r < m, the best r-dimensional approximation of 
ran(x!,...,x™) in the above sense is V = ran(v',...,v"), where {v}, ...,v"} are the eigen- 
vectors of the matrix $$" corresponding to the r largest eigenvalues. The subspace V 
is called the POD subspace and the matrix V, = (v',...,v") is the POD basis matrix. 
The same subspace is obtained via a compact singular value decomposition (SVD) of 
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the snapshot matrix $ = VEZ", truncated to the first r < m columns of Y € R™" by 
setting V := ran(V,). For more details, see, e. g., [18, §3.3]. In the following, we drop 
the index r and assume that Y is already the truncated matrix V = (vt, ..., v”) € R®Y. 
Since the input snapshots are supplied at a fixed system parameter vector Uo, the 
POD subspace is considered to be an appropriate space of solution candidates V (uo) = 
ran(V(Ug)) at Uo. 
Projection. POD leads to a parameter decoupling 


X(t, Uo) = V(Uo)x;(t). (7.2) 


In this way, the time trajectory of the reduced model is uniquely defined by the coef- 
ficient vector x,(t) € R” that represents the reduced state vector with respect to the 
subspace ran(V(y,)). Given a matrix W(y,) such that the matrix pair Y (uo), W(uo) is 
bi-orthogonal, i. e. W(Ho)’ Vo) = I, the original system (7.1) can be reduced in di- 
mension as follows. Substituting (7.2) in (7.1) and multiplying with W(uo)? from the 
left leads to 


S(O = W Holl (Vo %H(OHo)> (lo) = V" Ho oy (73) 


This approach goes by the name of Petrov—Galerkin projection, if W(u9) # V(up) and 
Galerkin projection if W(uo) = V(uo). There are various ways to proceed from (7.3) 
depending on the nature of the function f and many of them are discussed in other 
chapters of Model Order Reduction. If f (-; ig) is linear, the reduced operator W7 (u0) ° 
f(s Ho) ° V(Ho) can be computed a priori (‘offline’) and stays fixed throughout the time 
integration. If f(-;W9) is affine, the same approach can be carried over to the affine 
building blocks of f (-; Mọ); see e.g. [45]. For a nonlinear f (-; 4o), an affine approxi- 
mation can be constructed via the empirical interpolation method (EIM, [14]). Other 
approaches that address nonlinearities include the discrete empirical interpolation 
method (DEIM, [30]) and the missing point estimation (MPE, [13, 105]). 

For illustration purposes, we proceed with W(uọ) = Y(uọ) and assume that the 
right hand side function f splits into a linear and a nonlinear part: f (x; pọ) = A(Up)x + 
f(x; Uo), where A(uọ) € R™” is, say, a symmetric and negative definite matrix to foster 
stability. Then (7.3) becomes 


d 
Fr) = Vi Ho) A Uo) V Ho) + V” Uo) FV Ho)xr(0s Ho): 


In the discrete empirical interpolation method (DEIM, [30]), the large-scale nonlinear 
term £(V(uo)x,(¢); Uo) is approximated via a mask matrix P = (e; ,...,e;,) € RS, where 


fis.. is} € {1,... n} and ej = Coat € R” is the jth canonical unit vector. The 
mask matrix P acts as an entry selector on a given n-vector via Ply= (Vi cara vi)" eR’. 
In addition, another POD basis matrix U(ug) € R” is used, which is obtained from 
snapshots of the nonlinear term. The matrices P and U(u,) are combined to form an 
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oblique projection of the nonlinear term onto the subspace ran(U(uy)). This leads to 
the reduced model 


(0 = V” (po) A (Ho) V Uo) X(t) 
+ V" (Uo)U(Ho)(P™ Ug) PTE(V(Uo)x;(C)s Ho) (74) 


whose computational complexity is formally independent of the full-order dimension 
n; see [30] for details. Mind that by assumption, M (uo) := -VT (uo) Ap) V (Uo) is sym- 
metric positive definite and that both V(u9) and U(uọ) are column-orthogonal. More- 
over, for a fixed mask matrix P, coordinate changes of Y(uoọ) and U(uọ) do not af- 
fect the approximated state X(t, uo) = V(uo)x,(t), so that essentially, the reduced sys- 
tem (74) depends only on the subspaces ran(V(y))) and ran(U(y,)) rather than the 
matrices Y (uo) and U(g).’ 

Solving (7.3), (7.4) constitutes the online stage of model reduction. The main focus 
of this chapter is not on the efficient solution of the reduced systems (7.3) or (7.4) ata 
fixed Uo, but on tackling parametric variations in y. In view of the associated compu- 
tational costs, it is important that this can be achieved without computing additional 
snapshots in the online stage. 

A straightforward way to achieve this is to extend the snapshot sampling to the 
-parameter range to produce POD basis matrices that are to cover all input parame- 
ters. This is usually referred to as the “global approach”. For nonlinear systems, the 
global approach may suffer from requiring a large number of snapshot samples. More- 
over, the snapshot information is blurred in the global POD and features that occur 
only ina restricted regime affect the ROM predictions everywhere. Therefore, localized 
approaches are preferable; see e. g. the applications in and the numerical examples 
in [38, 100]. 

In this chapter, the focus is on constructing trajectories of functions in the system 
parameters u on certain sets of structured matrix spaces. In the above example, these 
are the symmetric positive definite matrices {M € R”” | MT = M,v’Mv > 0 W + 0}, 
the orthonormal basis matrices {U € R® | UTU = I} or the associated s-dimensional 
subspaces U := ran(U) c R”: 


U -V' (WAV € {M € R™ | MT = M,v' Mv > OW + O}, 
u> Uw € {U € R™ | UTU =I}, 
u = u(y) = ran(U(u)) € {U c R” | U subspace, dim(U) = s}. 


We outline generic methods for constructing such trajectories via interpolation. All 
the special sets of matrices considered above feature a differentiable structure that 


1 Replacing U with US, S € R° orthogonal, does not affect (7.4) at all. Replacing V with VR, R € 
R”” orthogonal, induces a coordinate change on the reduced state x, = RX, but preserves the output 
X(t) = Vx,(t) = VRR, (t). 
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allows one to consider them (directly or indirectly) as submanifolds of some Euclidean 
matrix space, referred to as matrix manifolds. The above example is not exhaustive. 
Other matrix manifolds may arise in model reduction applications. 

To keep the exposition both general and modular, the interpolation techniques 
will be formulated for arbitrary submanifolds. For working examples that put these 
techniques into action, the reader is referred to Chapter 5 of Volume 2 and Chapter 9 
of Volume 3 of Model Order Reduction. Model reduction literature on manifold inter- 
polation problems includes [8, 9, 18, 32, 34, 67, 74, 76, 78, 94, 100]. 


7.1.2 Structure and organization 


The text is constructed modular rather than consecutive, so that selected reading is 
enabled. Yet, this entails that the reader will encounter some repetition. 

Section 7.2 covers the essential background from differential geometry. Section 7.3 
contains generic methods for interpolation and extrapolation on matrix manifolds. In 
Section 7.4, the geometric and numerical aspects of the matrix manifolds that arise 
most frequently in the context of model reduction are discussed. 

A practitioner that faces a problem in matrix manifold interpolation may skim 
through the recap on elementary differential geometry in Section 7.2 and then move on 
to the subsection of Section 7.4 that corresponds to the matrix manifold in the appli- 
cation. This provides the specific ingredients and formulas for conducting the generic 
interpolation methods of Section 7.3. 


7.1.3 Notation & abbreviations 


— w.t.t.: with respect to 

— EVD: eigenvalue decomposition 

— SVD: singular value decomposition 

- POD: proper orthogonal decomposition 

- LTI: linear time-invariant (system) 

— ODE: ordinary differential equation 

— PDE: partial differential equation 

— ONB: orthonormal basis 

— IR”: the set of real n-by-r matrices 

- Ip: the n-by-n identity matrix; if dimensions are clear, written as I 

-  ran(A): the subspace spanned by the columns of A € R” 

— GL(n): the general linear group of real, invertible n-by-n matrices 

- sym(n)={AeR™" | A! = A}: the set of real, symmetric n-by-n matrices 
—  skew(n) = {A € R?” | AT = —A}: the set of real, skew-symmetric n-by-n matrices 
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—  SPD(n) = {A € sym(n) | x7 Ax > OVx € R” \ {O}}: the set of real, symmetric positive 
definite n-by-n matrices 

- O(n ={Q_R™"|Q'Q= h= QQT}: the orthogonal group 

— SO(n) = {Q € O(n) | det(Q) = 1}: the special orthogonal group 

—  St(n,r) = {U € R™’ | UTU =1,}: the (compact) Stiefel manifold, r < n 

—  Gr(n,r): the Grassmann manifold of r-dimensional subspaces of R”, r < n 

- M:a differentiable manifold 

- D, € M: an open domain around the point p on a manifold M 

- D, c R": an open domain in the Euclidean space around a point x € R” 

- T M: the tangent space of M at a location p € M 

— (A, B)ọ = trace(A TB): the standard (Frobenius) inner product on R™" 

- (v,w) Ae the Riemannian metric on T,M (the superscript is often omitted) 

—  exp,,: standard matrix exponential 

— logn: standard (principal) matrix logarithm 

- Exp the Riemannian exponential of a manifold M at base point p € M 

= Log™: the Riemannian logarithm of a manifold M at base point p € M. 


7.2 Basic concepts of differential geometry 


This section provides the essentials on elementary differential geometry. Established 
textbook references on differential geometry include [35, 60, 61, 63, 65]; condensed 
introductions can be found in [49, Appendices C.3, C.4, C.5] and [39]. An account of 
differential geometry that is tailor-made to matrix manifold applications is given in 
[3]. 

The fundamental objects of study in differential geometry are differentiable man- 
ifolds. Differentiable manifolds are generalizations of curves (one-dimensional) and 
surfaces (two-dimensional) to arbitrary dimensions. Loosely speaking, an n-dimen- 
sional differentiable manifold M is a topological space that ‘locally looks like R™ with 
certain smoothness properties. This concept is rendered precise by postulating that, 
for every point p € M, there exists a so-called coordinate chart x : M > Dy, > R” 
that bijectively maps an open neighborhood D, c M of a location p to an open neigh- 
borhood Dyp) € R” around x(p) € R” with the important additional property that the 
coordinate change 


xo% : (D, N Dy) > X(Dy N Dy) 


of two such charts x, X is a diffeomorphism, where their domains of definition overlap; 
see [39, Figure 18.2, p. 496] or [49, Figure 3.1, p. 342]. Note that the coordinate change 
x o x! maps from an open domain of R” to an open domain of R”, so that the stan- 
dard concepts of multivariate calculus apply. For details, see [3, §3.1.1] or [39, §18.8]. 
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Depending on the context, we will write x(p) for the value of a coordinate chart at p 
and also x € R” for a point in R”. 

Of special importance to numerical applications are embedded submanifolds in 
the Euclidean space. 


Definition 7.1 (Submanifolds of R™*4). A parameterization is a bijective differentiable 
function f : R” > D > f(D) c R™4 with continuous inverse such that its Jacobi matrix 
Df, € R°®*" has full rank n at every point x € D. 

A subset M c R"™ is called an n-dimensional embedded submanifold of R"*4, if 
for every p € M, there exists an open neighborhood Q c R"™4 such that Dy = MNQ 
is the image of a parameterization 


f :R" > D; > f(D) = Dy = MN Q c RI, 


One can show that, iff : D > MnQandf : Ď > MnQare two parameterizations, 
say with f (xo) = f (žo) = p € M NQ N Õ, then 


FoF) F ON) > f (Ang) 


is a diffeomorphism (between open sets in R”). In this sense, parameterizations f are 
the inverses of coordinate charts x. In addition to coordinate charts and parameteriza- 
tions, submanifolds can be characterized via equality constraints. This fact is due to 
the inverse function theorem of classical multivariate calculus [64, §I.5]. For details, 
see [39, Thm. 18.7, p. 497]. 


Theorem 7.1 ([39, Prop. 18.7, p. 500]). Leth : R"™? > Q = RÊ be differentiable and 
Co € R? be defined such that the differential Dhy € R24) has maximum possible rank 
dat every point p € Q with h(p) = co. Then the preimage 


h™ (co) = {p € Q | hP) = co} 


is an n-dimensional submanifold of R"*4. 


An obvious application of Theorem 7.1 to the function h : R? > R, (X1, X2,X3) > 
X} +X% +x3 — 1 establishes the unit sphere S? = h-1(0) as a 2-dimensional submanifold 
of R**!, As a more sophisticated example, we recognize the orthogonal group as a 
differentiable (sub)-manifold. 


Example 7.1. Consider the orthogonal group O(n) c R®” = R” and the set of sym- 
metric matrices sym(n) ~ R”™+/2, Define h : R™" — sym(n),A © ATA - I. Then 
Dh,(B) = ATB + B'A. For Q € O(n), the differential is indeed surjective: For any 
M € sym(n), we have Dhg(3QM) = 5Q'QM + 5M'Q'Q = M. As a consequence, the 
orthogonal group O(n) is a submanifold of dimension n - (n(n +1)) = (n(n —1)) of 
the Euclidean matrix space R™”. 
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7.2.1 Intrinsic and extrinsic coordinates 


As arule, numerical data processing on manifolds requires calculations in explicit co- 
ordinates. For differentiable submanifolds, we distinguish between two types: extrin- 
sic and intrinsic coordinates. Extrinsic coordinates address points on a submanifold 
M c R” with respect to their coordinates in the ambient space R”, while intrinsic 
coordinates are with respect to the local parameterizations. Hence, extrinsic coordi- 
nates are what an outside observer would see, while intrinsic coordinates correspond 
to the perspective of an observer that resides on the manifold. Let us exemplify these 
concepts on the two-dimensional unit sphere S?, embedded in R°. As a point set, the 
sphere is defined by the equation 


S? = {px ) € R? |x 4x5 +4 = 1} 


Any three-vector (Xp X2 X3) € S? specifies a point on the sphere in extrinsic coordi- 
nates. However, it is intuitively clear that S° is intrinsically a two-dimensional object. 
Indeed, S? can be parameterized via 


sin(a) cos(£) 
f: R? (0,20) > S c R’, (a, B) => | sin(a) sin(f) 
cos(a) 


The parameter vector (a, 8) € R? specifies a point on S? in intrinsic coordinates. Even 
though intrinsic coordinates directly reflect the dimension of the manifold at hand, 
they often cannot be calculated explicitly and extrinsic coordinates are the preferred 
choice in numerical applications [36, §2, p. 305]. Turning back to Example 711, we recall 
that the intrinsic dimension of the orthogonal group is $n(n — 1). Yet, in practice, one 
uses the extrinsic representation with (nxn)-matrices Q, keeping the defining equation 
Q’Q =I in mind. 


7.2.2 Tangent spaces 


We need a few more fundamental concepts. 


Definition 7.2 (Tangent space of a differentiable submanifold). Let M c R™4 be an 
n-dimensional submanifold of R"*¢. The tangent space of M at a point p € M, in 
symbols TM, is the space of velocity vectors of differentiable curves c : t + c(t) 
passing through p, i.e., 


T,M = {€(to) |c :J > M, c(to) = p}- 


Here, J ¢ Ris an arbitrarily small open interval with tọ € J. 
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v =6(0)_ ø P = 0) =0€T,M 
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\ 


e(t) 


Figure 7.1: Visualization of a manifold (curved surface) with the tangent space T,M attached. The 
tangent vector v = C(0) € T,M is the velocity vector of a curve c : t > c(t) € M. 


The concept is illustrated in Figure 7.1. It is straightforward to show that the tangent 
space is actually a vector space. Moreover, the tangent space can be characterized both 
with respect to intrinsic and extrinsic coordinates. 


Theorem 7.2 (Tangent space, intrinsic characterization). Let M c R"4 be an n-di- 
mensional submanifold of R"* and let f : R" > D > f(D) c M bea parameterization. 
Then, for x € D with p = f(x) € M, we have 


T,M = ran(Df,). 


Theorem 7.3 (Tangent space, extrinsic characterization). Leth: R”! 5 Q > Rf and 
Co € RÎ be as in Theorem 7.1 and let M := h\(co) c R, Then, for p € M, we have 


T,M = ker(Dh,). 


Note that both Theorem 7.2 and Theorem 7.3 immediately show that the tangent 
space T,,.M is a vector space of the same dimension n as the manifold M. 


Example 7.2. The tangent space of the orthogonal group O(n) at a point Qj is 
Tg, O(n) = {A € R™” | A”Qo = -Q3A}. 
This fact can be established via considering a matrix curve Q : t + Q(t) with Q(0) = Qo 


and velocity vector A = Q(0) € Tg, O(n). Then 


d d 
O= glo = g Q OO = A" Qo + QŽA. 


(The claim follows by counting the dimension of the subspace {A7 Qo = -Q} A}.) As an 
alternative, we can consider h : R™" — sym(n), A + ATA - I as in Example 7.1. Then 
Dho, (A) = QA + ATQ and Tg, O(n) = ker(Dhg,). 
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7.2.3 Geodesics and the Riemannian distance function 


One of the most important problems in both general differential geometry and data 
processing on manifolds is to determine the shortest connection between two points 
on a given manifold. This requires one to measure the lengths of curves. Recall that 
the length of a curve c : [a,b] — R” in the Euclidean space is L(c) = j |c(t)|| dt. In 
order to transfer this to the manifold setting, an inner product for tangent vectors is 
needed that is consistent with the manifold structure. 


Definition 7.3 (Riemannian metrics). Let M be a differentiable submanifold of R”*. 
A Riemannian metric on M is a family ((-,-),)ncqy Of inner products (-,-), : T,M x 
TyM > R that is smooth in variations of the base point p. 

The length of a tangent vector v € T,M is |v, := 4|(V,V),,.- The length of a curve 
c: [a,b] — M is defined as 


b 


b 
LO = | Netley dt = | VEDEO dt. 


a 


A curve is said to be parameterized by the arc length, if L(cltqy) = t — a for all 
t € [a, b]. Obviously, unit-speed curves with ||C(t)||_(4, = 1 are parameterized by the arc 
length. Constant-speed curves with ||C(¢)||4) = Vo are parameterized proportional to 
the arc length. The Riemannian distance between two points p,q € M with respect to 
a given metric is 


dist u (p, q) = inf{L(c) | c : [a,b]  M piecewise smooth, c(a) = p,c(b) = q}, (7.5) 


where, by convention, inf{0} = oo. 


Hence, a shortest path between p,q € M is a curve c that connects p and q such 
that L(c) = dist,,(p, q). In general, shortest paths on M do not exist.’ Yet, candidates 
for shortest curves between points that are sufficiently close to each other can be ob- 
tained via a variational principle: Given a parametric family of suitably regular curves 
C; : tr c(t) € M, s € (—e, £) that connect the same fixed endpoints c,(a) = p and 
c,(b) = q for all s, one can consider the length functional s + L(c,). A curve c = Co 
is a first-order candidate for a shortest path between p and q, if it is a critical point of 


2 This notation should not be confused with the classical p-norm y Yi lvl. 

3 Consider R?* = R?\ {(0, 0)} with the Euclidean inner product. There is no shortest connection from 
(-1, 0) to (1,0) on R?*.A sequence of curves that is in R?* and converges to the curve c : [-1,1] —> 
R? t > (t, 0) is readily constructed. Hence, the Riemannian distance between (—1, 0) and (1, 0) is 2. 
Yet, every curve connecting these points must go around the origin. The length-minimizing curve of 
length 2 crosses the origin and is thus not an admissible curve on R?*. 
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the length functional, i.e., if 4 |;-9L(c,) = 0. Such curves are called geodesics. Differ- 
entiating the length functional leads to the so-called first variation formula [65, §6], 
which, in turn, leads to the characterizing equation for geodesics: 


Definition 7.4 (Geodesics). A differentiable curve c : [a,b] — M is called a geodesic 
(w. r.t. to a given Riemannian metric), if the covariant derivative of its velocity vector 
field vanishes, i. e., 


Dé 
a =0 yte [a,b]. (7.6) 


Remark 7.1. If a starting point c(0) = p € M and a starting velocity c(0) = v € T,M 
are specified, then the geodesic equation (7.6) translates to an initial value problem of 
second order with guaranteed existence and uniqueness of local solutions, [3, p. 102]. 


An immediate consequence of (7.6) is that geodesics are constant-speed curves. A 
formal introduction of the covariant derivative 2 along a curve is beyond the scope 
of this contribution, and the interested reader is referred to, e. g., [65, §4, §5]. To get 
some intuition, we introduce this concept for embedded Riemannian submanifolds 
M c R™4, where the metric is the Euclidean metric of R" restricted to the tangent 
bundle; see also [39, §20.12]: 

A vector field along a curve c : [a,b] > M is a differentiable map v : [a, b] — R"*@ 
such that v(t) € Tag MÄ For every p € M, the ambient Rd decomposes into an 
orthogonal direct sum 


n+d i 
R"! = T,M@ T, Mt, 


where T, M+ is the orthogonal complement of T,M and orthogonality is w. r.t. the 
standard Euclidean inner product on R"™4, Let II, : RI T,,M be the (base point- 
dependent) orthogonal projection onto the tangent space at p. In this setting (and only 
in this), the covariant derivative of a vector field v(t) along a curve c(t) is the tangent 
component of v(t), i.e., a (t) = Il (V(t). As a consequence, 


and the geodesics on Riemannian submanifolds with the metric induced by the ambi- 
ent Euclidean inner product are precisely the constant-speed curves with acceleration 
vectors orthogonal to the corresponding tangent spaces, i. e., C(t) € TM". 


Example 7.3. On the unit sphere S? c R?, the geodesics are great circles. When con- 
sidered as curves in the ambient R?, their acceleration vector points directly to the 
origin and is thus orthogonal to the corresponding tangent space, see the cartoon be- 
low. When viewed as entities of S?, these curves do not experience any acceleration at 
all. 


4 The prime example for such a vector field is the curve’s own velocity field v(t) = c(t). 
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c(t) 


Mind that a constant-speed curve in R” changes its direction only, when it experiences 
anon-zero acceleration. In this sense, geodesics on manifolds are the counterparts to 
straight lines in the Euclidean space. 


In general, a covariant derivative, also known as a linear connection, is a bilinear 
mapping (X, Y) » VyY that maps two vector fields X, Y to a third vector field VyY in 
such a way that it can be interpreted as the directional derivative of Y in the direc- 
tion of X. Of importance is the Riemannian connection or Levi-Civita connection that 
is compatible with a Riemannian metric [3, Thm 5.3.1], [65, Thm 5.4]. It is determined 
uniquely by the Koszul formula, 


2(VxY, Z) = X((¥,Z)) + ¥((Z,X)) - Z((X, ¥)) 
~ (X, [¥,Z]) ~ (Y, [X,Z]) + (Z, 1X, YI), 


and is used to define the Riemannian curvature tensor 


(X, Y, Z) = R(X, Y)Z = VyVyZ - VyVxZ - Vix yZ” 


A Riemannian manifold is flat if and only if it is locally isometric to the Euclidean 
space, which holds if and only if the Riemannian curvature tensor vanishes identically 
[65, Thm. 7.3]. Hence, ‘flatness’ depends on the Riemannian metric. 


7.2.4 Normal coordinates 


The local uniqueness and existence of geodesics allows us to map a tangent vector v € 
T,M to the endpoint of a geodesic that starts from p € M with velocity v. Formalizing 
this principle gives rise to the Riemannian exponential, 


Expy’ : TM > B;(0) > M, v> q := Exp? (v) := Cpy(1). (7.8) 


Here, t +> cy (ft) is the geodesic that starts from p with velocity v and B,(0) c TM 
is the open ball with radius £ and center 0 in the tangent space;° see Figure 7.2. Note 


5 In these formulas, [X, Y] = X(Y) — Y(X) is the Lie bracket of two vector fields. 
6 For technical reasons, € > O must be chosen small enough such that Cp y(t) is defined on the unit 
interval [0, 1]. 
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J q = ExpM(v) = cp.(1) 


\ G } N 


Figure 7.2: The Riemannian exponential sends tangent vectors to end point of geodesic curves. 


that we can restrict the considerations to unit-speed geodesics via 


v 
Exp," (v) ‘= Cpy(1) = Cpyiivi (tv) = Expy “(477 ), 


where t, = livi; see [65, §5., p. 72 ff.] for the details. 

For € > 0 small enough, the Riemannian exponential is a smooth diffeomorphism 
between B,(0) and an open domain on D, c M around the point p. Hence, it is in- 
vertible. The smooth inverse map is called the Riemannian logarithm and is denoted 
by 


Log?" :M>d Dp — B,(0) c TM, qreves (Exp}*) (q), (79) 


where v satisfies Cpyv(1) =q. 

Thus, the Riemannian logarithm is associated with the geodesic endpoint problem: 
Given p,q € M, find a geodesic that connects p and q. The Riemannian exponential 
map establishes a local parametrization of a small region around a location p € M in 
terms of coordinates of the flat vector space T,,M. This is referred to as representing 
the manifold in normal coordinates [60, §III.8], [65, Lem. 5.10]. Normal coordinates 
are radially isometric in the sense that the Riemannian distance between p and q = 
Exp% (v) is exactly the same as the length of the tangent vector ||v||,, as measured in 
the metric on T,M, provided that v is contained in a neighborhood of 0 € T,M, where 
the exponential is invertible, [65, Lem. 5.10 & Cor. 6.11]. 

Mind that the definition of the Riemannian exponential depends on the geodesics, 
which, in turn, depend on the chosen Riemannian metric—via Definition 73. Different 
metrics lead to different geodesics and thus to different exponential and logarithm 
maps. 
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7.2.5 Matrix Lie groups and quotients by group actions 


In general, a Lie group is a differentiable manifold G which also has a group structure, 
such that the group operations ‘multiplication’ and ‘inversion’, 


GxGar(g,g)g-geG and Gagugleg 


are both smooth [39, 46, 41]. A matrix Lie group G is a subgroup of GL(n, C) that is 
closed in GL(n, C).’ This definition already implies that G is an embedded submanifold 
of C”” [46, Corollary 3.45]. Not all matrix groups are Lie groups and not all Lie groups 
are matrix Lie groups; see [46, §1.1 and §4.8]. However, matrix Lie groups are arguably 
the most important class of Lie groups when it comes to practical applications and 
this exposition is restricted to this subclass. 

Let G be an arbitrary matrix Lie group. When endowed with the bracket opera- 
tor or matrix commutator [V,W] = VW — WV, the tangent space T;g at the identity 
is called the Lie algebra associated with the Lie group G; see [46, §3]. As such, it is 
denoted by g = TG. For any A e€ G, the function “left-multiplication with A” is a 
diffeomorphism L, : G — G,L,(B) = AB; its differential at a point M e€ G is the iso- 
morphism d(L4)y : TMG > Tr m9: d(La)m(V) = AV. Using this observation at M = I 
shows that the tangent space at an arbitrary location A € G is given by the translates 
(by left-multiplication) of the tangent space at the identity: 


Tag = TLl = Ag = {A= AV ER™" | V € g}, (7.10) 


[41, §5.6, p.160]. The Lie algebra g = T;G of G can equivalently be characterized as 
the set of all matrices A such that exp,,(tA) € G forall t € R. The intuition behind 
this fact is that all tangent vectors are velocity vectors of smooth curves running on 
G (Definition 72) and that c(t) = exp,,(tA) is a smooth curve starting from c(0) = I 
with velocity ¢(0) = A; see [46, Def. 3.18 & Cor. 3.46] for the details. By definition, the 
exponential map® for a matrix Lie group is the matrix exponential restricted to the 
corresponding Lie algebra, i. e. the tangent space at the identity g = T;G, [46, §3.7], 


CXPm lg: 9 >G. 


In general, a Lie algebra is a vector space with a linear, skew-symmetric bracket oper- 
ation, called Lie bracket [-,-], that satisfies the Jacobi identity. 


[X, [Y,Z]] + [Z, [X, Y]] + [Y, [Z,X]] = 0. 


7 But not necessarily in C™". 
8 The exponential map of a Lie group must not be confused with the Riemannian exponential. 
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Quotients of Lie groups by closed subgroups 

In many settings, it is important or sometimes even necessary to consider certain 
points p, q on a given differentiable manifold M as equivalent. Consider the following 
example. 


Example 7.4. Let U € IR"*’ feature orthonormal columns so that UTU = I,. We may 
extend the columns of U = (u’,...,u") to an orthogonal matrix Q = (u’,...,u",u""1,..., 
u”) € O(n). Define I, x O(n —1r) := (G £) | R € O(n - r)}. This is actually a closed 
subgroup of O(n), in symbols (I, x O(n — r)) < O(n). The action Q = Q® with any 
orthogonal matrix ® € I, x O(n — r) preserves the first r columns of Q. Hence, we 
may identify U with the equivalence class [Q] = {QD | ® € I, x O(n - r)} c O(n). In 
Sections 7.4.4 and 7.4.5, we will see that this example establishes the Stiefel manifold 
of ONBs and eventually also the Grassmann manifold of subspaces as quotients of the 
orthogonal group O(n). 


Note that in the example, the equivalence relation is induced by actions of the 
Lie group I, x O(n — r). Quotients that arise from such group actions are important 
examples of quotient manifolds. Theorems 7.4 and 7.5 cover this example as well as all 
other cases of quotient manifolds that are featured in this chapter. First, group actions 
need to be formalized. 


Definition 7.5 (Cf. [66, p. 162,163]). Let G be a Lie group, M be a smooth manifold, 
and let g xM > M, (g, p) > g - p bea left action of G on M.? The orbit relation on M 
induced by G is defined by 


p=q:oAgeG: g-p=q. 


The equivalence classes are the G-orbits [p] := Gp := {g - p | g € G}. The orbit space is 
denoted by M/G := {[p] | p € M}. The quotient map sends a point to its G-orbit via II : 
M => M/G,p > [p]. The action is free, if every isotropy group Gp := {g € G |g -p = p} 
is trivial, Gp = {e}. 


Theorem 7.4 (Quotient manifold theorem, cf. [66, Thm. 21.10]). Suppose G is a Lie 
group acting smoothly, freely, and properly on a smooth manifold M. Then the or- 
bit space M/G is a manifold of dimension dim M - dim G, and has a unique smooth 
structure such that the quotient map II: M > M/G,p > [p] is a smooth submersion.'© 
In this context, M is called the total space and M/G is the quotient (space). 


A special case is Lie groups under actions of Lie subgroups. 


9 The theory for right actions is analogous. In all cases considered in this chapter, M is a matrix 
manifold so that “.” is the usual matrix product. 
10 I.e. a smooth surjective mapping such that the differential is surjective at every point. 
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Definition 7.6. [66, §21, p. 551] Let G be a Lie group and H < G bea Lie subgroup. For 
g € G, a subset of G of the form [g] := gH = {g -h | h € H} is called a left coset of H. The 
left cosets form a partition of G, and the quotient space determined by this partition 
is called the left coset space of G modulo H, and is denoted by G/H. 


Coset spaces of Lie groups are again smooth manifolds. 


Theorem 7.5 (Cf. [66, Thm 21.17, p.551]). Let G be a Lie group and let H be a closed 
subgroup of G. The left coset space G/H is a manifold of dimension dim G — dim H with 
a unique differentiable structure such that the quotient map II: G > G/H,g + [g] isa 
smooth submersion. 


In general, if: M — N is a surjective submersion between two manifolds M 
and WV, then for any q € N, the preimage 2 '(q) c M is called the fiber over q, and 
is denoted by M4. Each fiber M; is itself a closed, embedded submanifold by the 
implicit function theorem. If M has a Riemannian metric (-,-)“, then at each point 
p € M, the tangent space T,M decomposes into an orthogonal direct sum T,M = 
Tr, Map ® (TMr). The tangent space of the fiber T, Mnp) =: Vp is the called the 
vertical space, its orthogonal complement H, := Vi is the horizontal space. The ver- 
tical space is the kernel V, = ker(d7,,) of the differential da, : T,M > Typ; the 
horizontal space is isomorphic to Trp) M. This allows one to identify Hp = Ta) M; see 
[3, Figure 3.8., p. 44] for an illustration. This construction helps to compute tangent 
spaces of quotients, if the tangent space of the total space is known. 

IfG/His a quotient as in Theorem 74 or 7.5 and if II : G — G/H is the corresponding 
quotient map, then II is a local diffeomorphism. A Riemannian metric on the quotient 
can be defined by 

(vw) 2/* = ((dllg) V), (dll) W), VW € Tig (G/H). (711) 
For this (and only this) metric, the quotient map is a local isometry. 

In fact, Theorem 7.5 additionally establishes G/H as a homogeneous space, i.e. a 
smooth manifold M endowed with a transitive smooth action by a Lie group (cf. [66, 
§21, p. 550]). In the setting of the theorem, the group action is given by the left action 
of G on G/H given by g; - [g2] := [g; - 2&2]. A transitive action allows us to transport a 
location p € M to any other location q € M. 


7.3 Interpolation on non-flat manifolds 


When working with matrix manifolds, the data is usually given in extrinsic coordi- 
nates; see Section 7.2. For example, data on the compact Stiefel manifold St(n,r) = 
{U e R”®Y | UTU = I,}, r < n, is given in form of n-by-r matrices. These matrices 
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feature nr entries while the intrinsic number of degrees of freedom, i. e., the intrinsic 
dimension is nr — Sr(r + 1); see Section 7.4.4. Essentially, the practical obstacle asso- 
ciated with data interpolation on matrix manifolds arises from this fact. Given, say, 
k matrices on St(n, r) in extrinsic coordinates, interpolating entry-by-entry will most 
certainly lead to interpolants that do not feature orthogonal columns and thus are not 
points on the Stiefel manifold. Likewise, entry-by-entry interpolation of positive defi- 
nite matrices is not guaranteed to produce another positive definite matrix. 

There are essentially two different approaches to address this issue: Performing 
the interpolation on the tangent space of the manifold and using the Riemannian 
barycenter or Riemannian center of mass as an interpolant. Both will be explained 
in more detail in the next two subsections.” 


7.3.1 Interpolation in normal coordinates 


As outlined in Section 7.2, every location p € M on an n-dimensional differentiable 
manifold features a small neighborhood D, that is the domain of a coordinate chart 
x: M> Dz > Dyp) C R” that maps bijectively onto an open set Dyp) € R”. Therefore, 
for a sample data set {p,,...,p,} € Dp that is completely contained in the domain of a 
single coordinate chart x, interpolation can be performed as follows: 

1. Map the data set to Dy): Calculate v; = x(p1), -> Vk = X(Pk) € Dy): 

2. Interpolate in Dy) to produce the interpolant v* € Dyp). 

3. Map back to manifold: compute p* = x~!(v*) € Dp. 


In principle, any coordinate chart may be applied. In practice, the challenge is to find 
a suitable coordinate chart that can be evaluated efficiently. Moreover, it is desirable 
that the chosen chart preserves the geometry of the original data set as well as possi- 
ble.” The standard choice is to use normal coordinates as introduced in Section 7.2.4. 
This means that the Riemannian logarithm is used as the coordinate chart 


Log?! :M>Dp > B,(O) c T,M 
with the Riemannian exponential 
Exp” : 

Xp, : T,M > B,(0) > Dpc M 


as the corresponding parameterization. The general procedure of data interpolation 
via the tangent space is formulated as Algorithm 7.1. 


11 The German speaking reader may find an introduction that addresses a general scientific audience 
in [90]. 
12 There are no isometric coordinate charts on a non-flat manifold; see [65, Thm 7.3]. 
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Algorithm 7.1: Interpolation in normal coordinates. 


Input: Data set {p;,...,p,} c M. 
1: 


yy nu FwWwhNy 


Choose p; € {pj,...,D,} as a base point. 


: Check that Log% (p;) is well-defined for all j = 1,...,k. 
: forj =1,...,kdo 


Compute vj := Log% (p;) € T,M. 
end for 


: Compute v* via Euclidean interpolation of {v,,...,v,}. 
» Compute p* := Exp; (v*) 


Output: p* € M. 


Remark 7.2. There are a few facts that the practitioner needs to be aware of: 
1. 


The interpolation procedure of Algorithm 7.1 depends on which sample point is 
selected to act as the base point. Different choices may lead to different inter- 
polants.? 

For matrix manifolds, the tangent space is often also given in extrinsic coordi- 
nates. This means that an entry-by-entry interpolation of the matrices that rep- 
resent the tangent vectors may lead to an interpolant that is not in the tangent 
space. As an illustrative example, consider the Grassmannian Gr(n, r). Matrices 
Ar- -Ak € Try) Gr(n, r) are characterized by UTA, = 0. Entry-by-entry interpola- 
tion in the tangent space may potentially result in a matrix A* that is not orthog- 
onal to the base point U, i. e. U'A* + 0; see [100, §2.4]. 

In general, because of the vector space structure of the tangent space of any man- 
ifold M, it is sufficient to use an interpolation method that expresses the inter- 
polant in T,.M as a weighted linear combination of the sampled tangent vectors 
Vis... Vk € TM, 


Amongst others, linear interpolation, Lagrange and Hermite interpolation, spline 
interpolation and interpolation via radial basis functions fulfill this requirement. 
As an aside, the interpolation procedure is computationally less expensive, since 
it works on the weight coefficients w; rather than on every single entry. 


Quasi-linear interpolation of trajectories via geodesics 
In this paragraph, we address applications, where the sampled manifold data features 
a univariate parametric dependency. The setting is as follows. Let M be a Riemannian 


13 In the practical applications considered in [8], it was observed that the base point selection has 
only a minor impact on the final result. 
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manifold and suppose that there is a trajectory 
c:[ab] > M, u= cy) 


on M that is sampled at k instants 44,..., 4% € [a,b]. Then an interpolant ĉ for c can 
be computed via Algorithm 7.2. 


Algorithm 7.2: Geodesic interpolation. 


Input: Data set {c(y,),...,c(u,)} c M sampled from a curve c : y > c(u), unsampled 
instant yw" € [Hj Mj41]. 
1: Compute vj, := Logig (CHj) € Teu) M. 
KH, ) 
Bay j+ 
Output: ĉ(u*) € M interpolant of c(u*). 


2: Compute C(u*) := Expo() ( 


The interpolants at y € [H;,Hj,1] that are output by Algorithm 7.2 lie on the unique 
geodesic connection between the points c(y;) and cj +1). Hence, it is the straightfor- 
ward manifold analogue of linear interpolation and is base-point independent. 

The generic formulation of Algorithm 7.1 allows one to employ higher-order inter- 
polation methods. However, this does not necessarily lead to more accurate results: 
the overall error depends not only on the interpolation error within the tangent space 
but also on the distortion caused by mapping the data to a selected (fixed) tangent 
space; see Figure 7.3. 


oe 
. 


Figure 7.3: Illustration of the course of action of Algorithms 7.1 and 7.2. Algorithm 7.1 (right) first 
maps all data points to a selected fixed tangent space. In Algorithm 7.2 (left), two points pj = C(Hy) 
and pj.1 = C(Hj41) are connected by a geodesic line, then the base is shifted to point pj, and the 
procedure is repeated. 


Algorithms 71 and 7.2 can be applied in practical applications, where the Rieman- 
nian exponential and logarithm mappings are known in explicit form. Applications in 
parametric model reduction that consider matrix manifolds include [34] (GL(n)-data), 
[8, 76, 100] (Grassmann data), [104] (Stiefel data) and [9, 82] (SPD(n) data). 


7.3.2 Interpolation via the Riemannian center of mass 


As pointed out in Remark 7.2, interpolation of manifold data via the back and forth 
mapping of a complete data set of sample points between the manifold and its tan- 
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gent space depends on the chosen base point. As a consequence, sample points may 
experience an uneven distortion under the projection onto the tangent space; see Fig- 
ure 73 (right). An approach that avoids this issue is to interpret interpolation as the 
task of finding suitably weighted Riemannian centers of mass. This concept was in- 
troduced in the context of geodesic finite elements in [44, 91]. 

The idea is as follows: The Riemannian center of mass“ or Fréchet mean of a sam- 
ple data set {p,,...,p,} € M ona manifold with respect to the scalar weights w; > 0, 


i w; = lis defined as the minimizer(s) of the Riemannian objective function 


k 
Marq f(q@= Yw dist(q, p)? ER, 
i=1 
where dist(q, p;) is the Riemannian distance of (7.5). This definition generalizes the 
notion of the barycentric mean in Euclidean spaces. However, on curved manifolds, 
the global center might not be unique. Moreover, local minimizers may appear. For 
more details, see [58] and [4], which also give uniqueness criteria. 

Interpolation is now performed by computing weighted Riemannian centers. 
More precisely, let 4, . . ., 4g € RÊ be sampled parameter locations and let p; = p(u;) € 
M, i= 1,...,k be the corresponding sample locations on M. Interpolation is within 
the convex hull conv{y, ..., yg} ¢ RÎ of the samples. 

Let {ọp; : y (uw) | i = 1,...,k} be a suitable set of interpolation functions with 
(uj) = 6;, say Lagrangians [91], splines [44] or radial basis functions [26]. Then the 
interpolant p* = p(u*) € M at an unsampled parameter location p* € conv{py,,..., Mg} 
is defined as the minimizer of 


k 
p* = argminf(q) = 5 > vile") dist(q, pi)’. (7.12) 
qeM 25 

Ata sample location p;, one has indeed that a piu) dist(q, p? = Se 6; dist(q, py = 

dist(q, p’, which has the unique global minimum at q = pj. 
Computing p* requires one to solve a Riemannian optimization problem. The sim- 
plest approach is a gradient descent method [3, 4]. The gradient of the objective func- 

tion f in (7.12) is 


k 
Vfq =- >) pip") Log (p) € TM, (713) 
i=1 
see [58, Thm 1.2], [4, §2.1.5], [91, eq. (2.4)]. Hence, just like interpolation in the tangent 
space, the interpolation via the Riemannian center can be pursued only in applica- 
tions, where the Riemannian logarithm can be computed. A generic gradient descent 


14 Here, we introduce this for discrete data sets; for centers w. r. t. a general mass distribution; see the 
original paper [58], Section 1. 
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algorithm to compute the barycentric interpolant for a function p : RÊ > pe p(weM 
reads as follows. 


Algorithm 7.3: Interpolation via the weighted Riemannian center [4, 84]. 


Input: Sample data set {p, = p(u,),...,Px = PH) } c M, unsampled parameter loca- 

tion u* € conv(p4,...,4,) c RÊ, initial guess qo, convergence threshold T. 
:k:=0 
: Compute Vf}, according to (7.13) 
while ||Vf,, |, > T do 

select a step size a, 

dra = Expy“ (-0,.Vf,) 

k:=k+1 

end while 
Output: p* := qk € M interpolant of p(y"). 


THOS oe iN 


An implementation of this (type of) method for finding the Karcher mean in SO(3) is 
discussed in [84]. Of course, Riemannian analogues to more sophisticated nonlinear 
optimization methods may also be employed; see [3]. 

In the context of model reduction, the benefits of interpolation via weighted Rie- 
mannian centers and the computational costs of solving the associated Riemannian 
optimization problem must be juxtaposed. 


7.3.3 Additional approaches 


A large variety of sophisticated ideas and further manifold interpolation techniques 
exist in the literature: The acceleration-minimizing property of cubic splines in the Eu- 
clidean space can be generalized to Riemannian manifolds in the form of a variational 
problem [24, 27, 33, 57, 77, 88, 93]; see also [81] and the references therein. Moreover, 
the construction concepts of Bézier curves and the De Casteljau algorithm [15] can 
be transferred to Riemannian manifolds [1, 62, 81, 75, 89]. Bézier curves in Euclidean 
spaces are polynomial splines that rely on a number of so-called control points. To 
obtain the value of a Bézier curve at time t, a recursive sequence of straight-line con- 
vex combinations between pairs of control points must be computed. The transition 
of this technique to Riemannian manifolds is via replacing the inherent straight lines 
with geodesics [81]. Another option is to conduct the Bézier—De Casteljau algorithm in 
the tangent space and to transfer the results to the manifold via a geodesic averaging 
of the spline arcs that were constructed in the tangent spaces at the first and the last 
control point, respectively; see [43]. 

Derivative information may also be incorporated in manifold interpolation 
schemes. A Hermite-type method that is specifically tailored for interpolation prob- 
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lems on the Grassmann manifold is sketched in [7, 83.7.4]. General Hermitian manifold 
interpolation in compact, connected Lie groups with a bi-invariant metric has been 
considered in [55]. A practical approach to first-order Hermite interpolation of data on 
arbitrary Riemannian manifolds is discussed in [103]. 


7.3.4 Quasi-linear extrapolation on matrix manifolds 


In application scenarios, where both snapshot data of the full-order model and deriva- 
tive information are at hand, various approaches have been suggested to exploit the 
latter. On the one hand, derivatives can be used for improving the ROMs accuracy 
and approximation quality by constructing POD bases that incorporate snapshots and 
snapshot derivatives [28, 51, 54, 99]. On the other hand, snapshot derivatives enable to 
parameterize the ROM bases and subspaces or to perform sensitivity analyses [48, 47, 
97, 101]. In this section, we outline an approach to transferring the idea of extrapola- 
tion and parameterization via local linearizations to manifold-valued functions. The 
underlying idea is comparable to the trajectory piece-wise linear (TPWL) method; see 
[85] and Chapter 3 of this volume. Yet, TPWL linearizes the full-order model prior to 
the ROM projection, whereas here we consider linearizing ROM building blocks like 
the reduced orthogonal bases, reduced subspaces or reduced system matrices. 


A geometric first-order Taylor approximation 
Any differentiable function f : R” — R” can be linearized via a first-order Taylor ex- 
pansion. A step ahead of size t in direction d € R” gives f (xq + td) = f (Xo) + tDf,,(d) + 
O(t?). When considering t > c(t) := f (Xo + td) as a curve, then the first-order Taylor ap- 
proximant is the straight line g : t > c(O)+¢(0)t. Such a first-order linearization often 
serves for extrapolating a given nonlinear function in a neighborhood of a selected ex- 
pansion point. For doing so, the starting point c(0) and the starting velocity ¢(0) must 
be available. This procedure translates to the manifold setting, when straight lines are 
replaced with geodesics. 

Let u € R be a scalar parameter and let c : y + c(u) € M bea curve on a sub- 
manifold M. For given initial values c(4ọ) = Po € M and C(uo) = Vo € T,,M, the 
corresponding unique geodesic cp, ,, is expressed via the Riemannian exponential as 


Cony HM, p= Exp% (uvo). 


Algorithm 7.4: Geodesic extrapolation. 


Input: Scalar parameter yo € R, initial values c(up) € M,C(Uo) € Tegu), M sampled 
from a curve c : y > c(u) € M, parameter value u* > 0. 
1: Compute (Mọ + H*) := Expl He” €(Ho)) 
Output: ĉ(uo +p") € M extrapolant of c(uy + p*). 
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Example: extrapolating POD basis matrices 

As outlined in Section 7.1.1, snapshot POD works by collecting state vector snapshots, 
x! := x(t Ho)... X" = X(tm Uo) € IR" followed by an SVD of the snapshot ma- 
trix (x’,...,x”")(uo) =: So) = U(Up)Z(Uo)Z" (u0). Here, the matrix dimensions are 
U(ug) € R™™, (Uo) € R™", Zug) € R™". The objective is to approximate U(ug + 4) 
for a small u > O based on the data U(up), U(ug), where U(uọ) is a point on the Stiefel 
manifold St(n, m) and U(yg) is a tangent vector; see Section 7.4.4.1. 

Differentiating the SVD. If the snapshot matrix function u + S(u) € R”™” is smooth 
in the neighborhood of wo € R and if the singular values of S(ug) are mutually dis- 
tinct,” then the singular values and both the left and the right singular vectors are 
differentiable in u € [Wg — Ôp, Mo + Su] for ôu small enough. For brevity, let $ = T (Ho) 
denote the derivative with respect to u evaluated in wy and so forth. Let y > S(p) = 
UWEWZWT € R®™ and let C(u) = (S'S)(u). Let w and vi, j = 1,...,m, denote the 
columns of U(uọ) and Z(ug), respectively. We have 


6; =(w)' Sv, G=1,...,m), (7.14) 
gu) svia Sv ij 

Z=ZA, where Ay = l (o@j+0;)(0j-0;) >? J (i,j =1,...,m), (715) 
0, i=j 

U = $Z” + $SŻX" + SZ" = ($Z + U(A - È))X™. (7.16) 


A proof can be found in [48]. Note that UT (uU (Up) is skew-symmetric so that indeed 
Ù(uo) =: Ao) € Tugu) St(Nn, m). The above equations hold in approximative form for 
the truncated SVD. For convenience, assume that U(uọ) € St(n, r) is now the truncated 
to r < m columns. 

Performing the Taylor extrapolation on St(n, r). With U(uo), Ù (Ho) at hand, U(uy + 
u) can be approximated using the Stiefel exponential: U(uọ + 4) ~ Uo + H) := 
Expy, (uU(Up)); see Algorithm 7.7. The process is illustrated in Figure 7.4. 

Note that when the p-dependency is real-analytic, then the Euclidean Taylor ex- 
pansion 


2 
U(uo + u) = Uo) + HU (Go) + E Üo) + O(P) € St(n,r) (717) 


converges to an orthogonal matrix U(ug +4) € St(n, r). Yet, when truncating the Taylor 
series, we leave the Stiefel manifold. In particular, the columns of the first-order ap- 
proximation are not orthonormal, i.e. U(uo) +WU(Uo) ¢ St(n,r) for u + 0. By construc- 
tion, the Stiefel geodesic features the same starting velocity Ù(uo) and thus matches 
the Taylor series up to terms of second order. In addition, it respects the geometric 
structure of the Stiefel manifold and thus preserves column-orthonormality for ev- 


ery UL. 


15 This condition can be relaxed; see the results of [5, §7]. 
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Figure 7.4: Extrapolation of matrix manifold data. Sketched on the right is the sample matrix data in 
R”””. The curved line on the left represents the nonlinear matrix manifold; the straight lines repre- 
sent the tangent vectors in the tangent space. The matrix curve is linearized at U(qg), U(q;), etc. 


7.4 Matrix manifolds of practical importance 


In this section, we discuss the matrix manifolds that feature most often in practical ap- 

plications in the context of model reduction. For each manifold under consideration, 

we recap, if applicable 

— the representation of points/locations in numerical schemes. 

— the representation of tangent vectors in numerical schemes. 

— the most common Riemannian metrics. 

— howto compute distances, geodesics and the Riemannian exponential and loga- 
rithm mappings. 


7.4.1 The general linear group 


This section is devoted to the general linear group GL(n) of invertible square matrices. 
In model reduction, regular matrices appear for example as (reduced) system matri- 
ces in LTI and discretized PDE systems [9, 34, 78] and parameterizations have to be 
such that matrix regularity is preserved. In addition, the discussion of the seemingly 
simple matrix manifold GL(n) is important, because it is the fundamental matrix Lie 
group from which all other matrix Lie groups are derived. Moreover, it provides the 
background for understanding quotient spaces of GL(n); see Subsection 7.2.5 and also 
[23, 96]. A short summary on the Riemannian geometry of GL(n) is given in [83, 86]. 


7.4.1.1 Introduction and data representation in numerical schemes 


Because GL(n) = det ‘(IR\{0}) = {A € R®” | det(A) + 0}, GL(n) is an open subset of the 
2 
n’-dimensional vector space R”™” = IR” and is thus an n’-dimensional differentiable 
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manifold; see [66, Examples 1.22-1.27]. The matrix manifold GL(n) is disconnected as it 
decomposes into two connected components, namely the regular matrices of positive 
determinant and the regular matrices of negative determinant. 

Because GL(n) is an open subset of the vector space R™”, the tangent space at a 
location A € GL(n) is simply T,GL(n) = R™". For GL(n), the Lie algebra is gl(n) = R””, 
so that the Lie group exponential is the standard matrix exponential exp,, : R™” = 
gl(n) — GL(n). From the Lie group perspective (7.10), the tangent space at an arbitrary 
point A € GL(n) is to be considered as the set T,GL(n) = Agl(n) = A(R"), even though 
this set coincides with R™". 


7.4.1.2 Distances and geodesics 


The obvious choice for a Riemannian metric on GL(n) is to use the inner product from 
the ambient Euclidean matrix space, i.e., 


(A, A), = (A, A), = trace(A’ A), 


for A € GL(n) and A,A € T,GL(n) = R™". 

In many applications, it is more appropriate to consider metrics with certain in- 
variance properties.!° A left-invariant metric can be obtained from the standard metric 
via 

(AA), = (AA, A) A€GL), AÑ e T,GL(n). (718) 


When formally considering A = AV,A = AV e T,GL(n) = Agl(n) as left-translates of 
tangent vectors V,V € T;GL(n) = gl(n), then this metric satisfies (A, A), = (V,V)o. 
Alternatively, (V, V)y = (AV, AV) 4, which explains the name ‘left-invariant’. 


The Riemannian exponential and logarithm for the flat metric 

When equipped with the Euclidean metric, GL(n) is flat: since the tangent space is 
the full matrix space IR”, the geodesic equation (7.7) requires the acceleration of a 
geodesic curve to vanish completely. Hence, the geodesic that starts from A € GL(n) 
with velocity A € R®” is the straight line C(t) = A+tA. Note that the curve t > C(t) may 
leave the manifold GL(n) for some t € R as it may hit a matrix with zero determinant. 
The formulas for the Riemannian exponential and logarithm mapping at a base point 
A € GL(n) are 


Exp?” : T,GL(n) > B,(0) > GL(n), AK A:=A+A, (7.19) 


16 “Eulerian motion of a rigid body can be described as motion along geodesics in the group of rota- 
tions of three-dimensional euclidean space provided with a left-invariant Riemannian metric. A sig- 
nificant part of Euler’s theory depends only upon this invariance, and therefore can be extended to 
other groups.” [11, Appendix 2, p. 318]. 
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Log@*:  GL(n) > T,GL(n), Ac A= (Ã-A). (7.20) 


In (7.19), B,(0) denotes a suitably small open neighborhood around O € T,GL(n) = 
R™" such that A + A € GL(n) for all A € B,(0). 


The Riemannian exponential for the left-invariant metric on GL(n) 

The left-invariant metric induces a non-flat geometry on GL(n). Formulae for the co- 
variant derivatives and the corresponding geodesics are derived in [10, Thm. 2.14]. The 
counterparts w. r. t. the right-invariant metrics can be found in [96]. Given a base point 
A € GL(n) and a starting velocity A = AV € T,GL(n) = Agl(n), the associated geodesic 
is 


Taa : t Aexp,,(tV") exp,,(t(V -V‘)). (7.21) 
The Riemannian exponential is 


Expy (A) =T4q(1) =A exp,,(V’) exp,,(V - v’) 
= A exPm((47”A)") exp,,((A-1A) - (A7A)’). (7.22) 


The author is not aware of a closed formula for the inverse map, i. e., the Riemannian 
logarithm for the left-invariant metric; see also the discussion in [96, §4.5]. The thesis 
[83, §6.2] introduces a Riemannian shooting method for computing the Riemannian 
logarithm w. r. t. the left-invariant metric. 


An important special case 

For tangent vectors A = AV € T,GL(n) with normal V € R™",i.e., VV? = V'V, 
we have that the matrices V7 and (V - VT) commute. Therefore, according to (7.36), 
A exp,,(V") exp, (V - VI)=A exp,,(V7 +V- Vv") = Aexp,,(V) and the Riemannian 
exponential reduces to 


Exp? : T,GL(n) n {A | AA normal} > GL(n), A > A = Aexp,,(A ‘A). 
The Riemannian logarithm is 
Log% : D; N {Ä | AA normal} > T,GL(n), Aw A= Alog,,(A 'A), 


where D4 c GL(n) is a domain such that a suitable branch of the matrix logarithm 
is well-defined. These expressions are sometimes encountered in the literature as the 
Riemannian exponential and logarithm mappings. Yet, one should be aware of the 
fact that they hold under special circumstances. 
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7.4.2 The orthogonal group 


This section is devoted to the orthogonal group O(n) c R™" of orthogonal n-by-n matri- 
ces. In parametric model reduction, such matrices may appear as eigenvector matrices 
in symmetric EVD problems. 


7.4.2.1 Introduction and data representation in numerical schemes 


The orthogonal group is O(n) = {Q € R™" | QQ’ = I = Q’Q}. The manifold structure 
of O(n) can be established via Theorem 7.1; see also Example 7.1. The orthogonal group 
decomposes into two connected components, namely the orthogonal matrices with 
determinant 1 and the orthogonal matrices with determinant —1. The former constitute 
the special orthogonal group SO(n) = {Q € O(n) | det(Q) = 1}. The orthogonal group 
is a closed subgroup of the Lie group GL(n) and thus itself a Lie group (Section 7.2.5). 
The tangent space T,;O(n) at the identity forms the Lie algebra associated with the Lie 
group O(n). It coincides with the Lie algebra of SO(n) and as such is denoted by so(n) = 
T,;SO(n) = T;O(n), [46, §3.3, 3.4]. The Lie algebra of SO(n) is precisely the vector space 
of skew-symmetric matrices, so(n) = skew(n). According to (7.10), the tangent space 
at an arbitrary location Q is given by the translates (by left-multiplication) of the Lie 
algebra 


TgO(n) = Qso(n) = {A = QV € R™" | V € skew(n)}, 
which is the same as {A € R™” | QTA = —A’Q}. The Lie exponential is 
€XPm son) : so(n) md SO(n). (7.23) 


This restriction is a surjective map; see Appendix A. The dimensions of both TgO(n) 
and O(n) are $n(n —1). 


7.4.2.2 Distances and geodesics 


We follow up on the discussion in Section 74.1.1. For the orthogonal group, the Eu- 
clidean metric and the left-invariant metric coincide. Let A = QV,A = QV e€ TgO(n) = 
Qso(n). Then 


(A, Ño = (QA, Q), = (VV) 
= trace(V'V) = trace(V"Q7QŬ) = (A,A),. 


In fact, the metric is also right-invariant, which makes it a bi-invariant metric; see 
[6, §2]. Bi-invariant metrics are important, because for Lie groups endowed with bi- 
invariant metrics, the Lie exponential map and the Riemannian exponential map at 
the identity coincide [6, Thm. 2.27, p. 40]. 
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The Riemannian exponential and logarithm maps on O(n) 

The Riemannian O(n)-exponential at a base point Q € O(n) sends a tangent vector 
A € TgO(n) to the endpoint Q € O(n ofa geodesic that starts from Q with velocity 
vector A. Therefore, it provides at the same time an expression for the geodesic curves 
on O(n). A formula for computing the Riemannian O(n)-exponential was derived in 
[36, §2.2.2]. Given Q € O(n), we have 


Expo” : TgO(n) > ON), A> Q:= QexPn(Q7A). (7.24) 


This result is also immediate from abstract Lie theory; see [6, Eq. (2.2) & Thm. 2.27.7 


The corresponding Riemannian logarithm on O(n) is 
Logo" : O(n) > Dg > TgO(n), Q= A: Qlog,,(Q’Q) (7.25) 


and is well defined on a neighborhood Dg c O(n) around Q such that, for all Qe Dp, 
the orthogonal matrix Q’Q does not feature A = —1 as an eigenvalue. 


The Riemannian distance between orthogonal matrices 

For given Q, Qe O(n) from the same connected component of O(n), consider the EVD 
QTÕ = WAY", Because Q7Õ is orthogonal, we have A = diag(e™,...,e!) and we 
assume that 0}, ..., 0, € (-7, 7t). The Riemannian distance is 


diston(Q, Q) = I Log?"(@llq = ogm(A) Ip = & a) 
k=1 


The compact Lie group SO(n) is a geodesically complete Riemannian manifold [6, 
Hopf—Rinow theorem, p. 31], and each two points of SO(n) can be joined by a mini- 
mal geodesic. 


7.4.3 The matrix manifold of symmetric positive definite matrices 


This section is devoted to the matrix manifold SPD(n) of real, symmetric positive- 
definite n-by-n matrices. In model reduction, such matrices appear for example as 


17 The Lie exponential is exp, lso(n) : $9(2) — SO(n), which is in the case at hand the Riemannian 
exponential at the identity, Exp;? = exp,» | so(n)- This translates to any other location via [6, Eq. (2.2)] 
as follows: Pick any Q € SO(n) and consider the mapping “left-multiplication by Q”, i. e., Lo : SO(n) > 
SO(n), P + QP. Then the differential is d(Lg); : T;SO(n) > Try SO(n), V > A := QV. Because Lg is 
an isometry, 

QExp;°(V) = Lo(Exp?°(V)) = Exproiy(d(Lq)(V)) = Expo (QV), 


which gives Expo (QV) = QExp;°(V) = Qexpm(Q™A) and thus (7.24). 
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(reduced) system matrices in second-order parametric ODEs. For example, in linear 
structural or electrical dynamical systems, mass, stiffness and damping matrices are 
usually in SPD(n), [9, §4.2]. Moreover, positive definite matrices arise as Gramians of 
reachable and observable LTI systems in the context of balanced truncation; see Chap- 
ter 2 of this volume. 

Related is the manifold of positive semi-definite matrices of fixed rank. It is inves- 
tigated in [23, 96, 68]. An application in model reduction features in [67]. 


7.4.3.1 Introduction and data representation in numerical schemes 


The set 
SPD(n) = {A € sym(n) | x’ Ax > 0 Yx € R” \ {0} 


is an open subset of the metric Hilbert space (sym(n), (-,-))) of symmetric matrices. 
As such, it is a differentiable manifold [22, §6]. Moreover, it forms a convex cone [37, 
Example 2, p. 8], [71, §2.3], and can be realized as a quotient SPD(n) = GL(n)/O(n). The 
latter is based on the fact that, for A € SPD(n), matrix factorizations A = ZZ’ with 
Z € GL(n) are invariant under orthogonal transformations Z + ZQ, Q € O(n), [23, §2, 
p.3]. 

Since SPD(n) is an open subset of the vector space sym(n), the tangent space is 
simply 


T,SPD(n) = sym(n). (7.26) 


The dimensions of both T,SPD(n) and SPD(n) are $n(n +1). 

There is a smooth one-to-one correspondence between sym(n) and SPD(n). That 
is, every positive definite matrix can be written as the matrix exponential of a unique 
symmetric matrix, [39, Lem. 18.7, p. 472]. Put in different words, when restricted to 
sym(n), the standard matrix exponential 


eXPm : sym(n) — SPD(n) 
is a diffeomorphism, its inverse is the standard principal matrix logarithm 
log, : SPD(n) > sym(n); 
see also [12, Thm. 2.8]. The group GL(n) acts on SPD(n) via congruence transformations 
gy(A) =X"AX, X € GL(n),A € SPD(n). (7.27) 


For additional background on SPD(n); see [72, 73, 79]. Applications in computer vision 
are presented in [31, 59]. 
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7.4.3.2 Distances and geodesics 


The literature knows a large variety of distance measures on SPD(n); see [56, Table 3.1, 
p.56]. Yet, there are essentially two choices that are associated with inner products 
on the tangent space of SPD(n) and thus induce Riemannian metrics on the manifold 
SPD(n): the so-called natural metric and the log-Euclidean metric. Let A € SPD(n) and 
let A, A € sym(n) be two tangent vectors. 

— The natural metric is 


(A, Ay, = (APAA, AV? RAY), = trace(A AAA), 


see [22, §6, p. 201], [23]. It also goes by the name trace matric, [64, §XII.1, p.322]. In 
statistical applications, it is usually called the affine-invariant metric [70, 80].'® 
— The log-Euclidean metric is 


(A, A), = (D(logm)4(A), Dog) a(A)) 93 
see [12, eq. (3.5)]. 


For the natural metric, it is more appropriate to consider sym(n) = T;SPD(n) as the tan- 
gent space at the identity and the tangent space at an arbitrary location A € SPD(n) 
as T,SPD(n) = AY 2(T;SPD(n))A” 2 which, of course, is nothing but a reparameteriza- 
tion of sym(n). From this perspective, we have for tangent vectors A = AY? VA”? À = 
Avy qi 


(A, A), = (V, V)o. 


The congruence transformations (7.27) are isometries of SPD(n) with respect to the 
natural metric, [64, Thm. XII.1.1, p. 324], [22, Lem. 6.1.1, p. 201]. See also the discussion 
in [80, §3]. 

By a standard pullback construction from differential geometry [35, Def. 2.2, Ex- 
ample 2.5], the log-Euclidean metric transfers the inner product (-,-),) on sym(n) to 
SPD(n) via the matrix logarithm log : SPD(n) > sym(n). In [12, eq. (3.5)], the authors 
take this construction one step further and use the exp,,,—log,,,-correspondence to de- 
fine a multiplication that turns SPD(n) into a Lie group and, eventually, into a vector 
space. As such, itis a flat manifold, i. e. a Riemannian manifold with zero curvature. In 
this way, the computational challenges that come with dealing with data on nonlinear 
manifolds are circumvented. 


18 The motivation is as follows: if y = Ax + V9, A € GL(n) is an affine transformation of a random vec- 
tor x, then the mean is transformed to y := AX + Vg and the covariance matrix undergoes a congruence 
transformation Cy E[(y -y)y y)"] = AC xA". 
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Which metric is to be preferred is problem-dependent; see the various contribu- 
tions in [92] and [69]. Since the natural metric arises canonical both from the geometric 
approach, [64, §XII.1], and the matrix-algebraic approach [22, §6] and since staying 
with the standard matrix multiplication is consistent with the setting of solving dy- 
namical systems in model reduction applications, we restrict the discussion of the 
Riemannian exponential and logarithm to the geometry that is based on the natural 
metric. 


The SPD(n) exponential 

The Riemannian SPD(n)-exponential at a base point A € SPD(n) sends a tangent vector 
A to the endpoint A € SPD(n) of a geodesic that starts from A with velocity vector A. 
Therefore, it provides at the same time an expression for the geodesic curves on SPD(n) 
with respect to the natural metric. Formulae for computing the SPD(n)-exponential 
can be found in [23], [80]. The reader preferring a matrix-analytic approach is referred 
to [22, §6]. 


Algorithm 7.5: Riemannian SPD(n)-exponential 


Input: base point A € SPD(n), tangent vector AE T,SPD(n) = sym(n) 
Output: A := Exp3’?(A) = A? exp,,(A 2AA2)A?. 


Here, Až denotes the matrix square root of A; see Appendix A. 


The SPD(n) logarithm 

The Riemannian SPD(n)-logarithm at a base point A € SPD(n) finds for another point 

A € SPD(n) an SPD(n)-tangent vector A such that the geodesic that starts from A with 

velocity A reaches A after an arc length of |All; = VA, A),. Therefore, it provides for 

two given data points A,A € SPD(n) 

—- asolution to the geodesic endpoint problem: a geodesic that starts from A and 
ends at A; 

— the Riemannian distance between the given points A, A. 


Formulas for computing the SPD(n)-logarithm can be found in [23], [80]. 


Algorithm 7.6: Riemannian SPD(n)-logarithm. 


Input: base point A € SPD(n), location A € SPD(n) 
Output: A := Log?’?(A) = A? log,,(A-2AA2)A?. 


Both Algorithms 7.5 and 76 require one to compute the spectral decomposition of 
n-by-n-matrices. The computational effort is O(n’). In the context of parametric model 
reduction, the Riemannian exponential and logarithm maps are usually required for 
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reduced matrix operators [9]. If n denotes the dimension of the full state vectors and 
r < n denotes the dimension of the reduced state vectors, then matrix exponentials 
for r-by-r-matrices are required, so that the computational effort reduces to O(r?). 


7.4.4 The Stiefel manifold 


This section is devoted to the Stiefel manifold St(n,r) c IR” of rectangular column- 
orthogonal n-by-r matrices, r < n. Points U € St(n,r) may be considered as orthonor- 
mal bases of cardinality r, or r-frames in R”. In model reduction, such matrices appear 
as orthogonal coordinate systems for low-order ansatz spaces that usually stem from 
a proper orthogonal decomposition or a singular value decomposition of given input 
solution data. Modeling data on the Stiefel manifold corresponds to data processing 
for orthonormal bases and thus allows for example for interpolation/parameteriza- 
tion of POD subspace bases. The most important use case in model reduction is where 
the Stiefel matrices are tall and skinny, i.e., r « n. Interpolation problems on the 
Stiefel manifold have not yet been considered in the model reduction context. The 
reference [62] discusses interpolation of Stiefel data; however, using quasi-geodesics 
rather than geodesics. Reference [103] includes numerical experiments for interpolat- 
ing orthogonal frames on the Stiefel manifold that relies on the canonical Riemannian 
Stiefel logarithm [83, 102]. 


7.4.4.1 Introduction and data representation in numerical schemes 


The Stiefel manifold is the compact, homogeneous matrix manifold of column-orthog- 
onal rectangular matrices 


St(n,r) := {U € R™ | UTU =1,}. 


The manifold structure can be directly established via Theorem 7.1 in a similar way as 
in Example 7.1. An alternative approach is via Example 7.4, where St(n, r) is identified 
with the quotient space St(n, r) = O(n)/(I, x O(n - r)) under actions of the closed sub- 
group I, x O(n =- r) := {(% £) | R € O(n - r)} < O(n). Two square orthogonal matrices 
in O(n) are identified as the same point on St(n, r), if their first r columns coincide; see 
[36, §2.4]. 

For any matrix representative U € St(n,r), the tangent space of St(n,r) at U is 
represented by 


TySt(n,r) = {A € R™ | UTA = -ATU} c R”. 
Every tangent vector A € T,St(n,r) may be written as 


A=UA+(I-UU')T, AeR™ skew, T «e R”"arbitrary, (7.28) 
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A=UA+U‘B, A€éeR™ skew, Be R””™ arbitrary, (7.29) 


where, in the latter case, U+ € St(n,n — r) is such that (U, U+) € O(n) is a square 
orthogonal matrix. The dimension of both TySt(n, r) and St(n, r) is nr - 5r(r + 1). For 
additional background and applications, see [3, 21, 29, 36, 52, 95]. 


7.4.4.2 Distances and geodesics 


Let U € St(n,r) be a point and let A = UA + U+B, A= UA+ U+B e€ TySt(n,r) be tangent 
vectors. There are two standard metrics on the Stiefel manifold. 
— The Euclidean metric on TySt(n,r) is the one inherited from the ambient R™’: 


(A, A) = trace(A’A) = trace A’A + trace B’B. 


— The canonical metric on TySt(n,r) 
(A, A); = trace( a" (1 = su" JA) = 5 trace ATA + trace B’B 


is derived from the quotient representation St(n,r) = O(n)/(I, x O(n — r)) of the 
Stiefel manifold. 


The canonical metric counts the independent coordinates” of a tangent vector equally, 
when measuring the length /(A,A)y of a tangent vector A = UA + U*B, while the 
Euclidean metric disregards the skew-symmetry of A [36, §2.4]. Recall that different 
metrics entail different measures for the lengths of curves and thus different formulae 
for geodesics. 


The Stiefel exponential 
The Riemannian Stiefel exponential at a base point U € St(n,r) sends a Stiefel tangent 
vector A to the endpoint U e St(n,r) of a geodesic that starts from U with velocity 
vector A. Therefore, it provides at the same time an expression for geodesic curves on 
St(n, r). 

A closed-form expression for the Stiefel exponential w. r. t. Euclidean metric is in- 
cluded in [36, §2.2.2], 


E UTA -ATA\\ (I 
Si T 
U = Exp) = (U, A) expm (( : ura ))(@)exPm(-U"A). 


In [53], an alternative formula is derived that features only matrix exponentials of 
skew-symmetric matrices. An efficient algorithm for computing the Stiefel exponential 
w. r. t. the canonical metric was derived in [36, 82.4.2]: 


19 That is, the upper triangular entries of the skew-symmetric A and the entries of B of A= UA+U*B. 
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Algorithm 7.7: Stiefel exponential [36]. 


Input: base point U € St(n, r), tangent vector A € TySt(n, r) 


1: A:=U'A {horizontal component, skew} 

2: QR:=A-UA {(thin) qr-decomp. of normal component of A.} 
T 

3: ( f T ) = TATË eR? {EVD of skew-symmetric matrix} 


: M = H (I, 2rxr 
4: (Ny) = Tepa (g) eR 


Output: U := Exp% (A) = UM + QN e€ St(n,r) 


In applications, where Exp% (uA) needs to be evaluated for various parameters u 
as in the example of Section 7.3.4, steps 1.-3. should be computed a priori (offline). 
Apart from elementary matrix multiplications, the algorithm requires one to compute 
the standard matrix exponential of a skew-symmetric matrix. This, however, is for a 
2r-by-2r-matrix and does not scale in the dimension n. With the usual assumption of 
model reduction that n > p, the computational effort is O(nr’). 


The Stiefel logarithm 

The Riemannian Stiefel logarithm at a base point U € St(n, r) finds for another point 

U € St(n,r) a Stiefel tangent vector A such that the geodesic that starts from U with 

velocity A reaches U after an arc length of ||Al|y = (A, Ajy. Therefore, it provides for 

two given data points U, Ū € St(n,r) 

- a solution to the geodesic endpoint problem: a geodesic that starts from U and 
ends at U; 

— the Riemannian distance between the given points U, U. 


An efficient algorithm for computing the Stiefel logarithm w. r. t. the canonical metric 
was derived in [102]. 


Algorithm 7.8: Stiefel logarithm [102]. 


Input: base point U € St(n,r), U € St(n,r) ‘close’ to base point, T > O convergence 
threshold 
1: M:= UTŪ e R”! 


2: QN := U-UMe R" {(thin) qr-decomp. of normal component of Ü} 
3: Vo := ( 2 € O(2r) {compute orth. completion of the block (a) 
N Yo N 
4: for k = 0,1,2,...do 
A, Bi 
5 := log,,(V;) {matrix log of orth. matrix} 
Bk Ck 
6: if Cyl < t then 


7: break 
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8: endif 
9: Dy := eXPm (-C;) {matrix exp of skew matrix} 
I (0) 
10: V = VWa where We = (4 ) 
k+1 kWk k 0 D 
11: end for 


Output: A := Log% (Ŭ) = UA; + QB, € TySt(n, 1) 


The analysis in [102] shows that the algorithm is guaranteed to converge if the input 
data points U, U are at most a Euclidean distance of d = ||U — U||, < 0.09 apart. In 
this case, the algorithm exhibits a linear rate of convergence that depends on d but 
is smaller than Z. In practice, the algorithm seems to converge, whenever the initial 
V, is such that its standard matrix logarithm log,,(Vo) is well-defined. Note that two 
points on St(n, r) can at most be a Euclidean distance of 2 away from each other. 

Apart from elementary matrix multiplications, the algorithm requires one to com- 
pute the standard matrix logarithm of an orthogonal 2r-by-2r-matrix and the standard 
matrix exponential of a skew-symmetric r-by-r-matrix at every iteration k. Yet, these 
operations are independent of the dimension n. With the usual assumption of model 
reduction that r « n, the computational effort is O(nr’). 

For the Stiefel manifold equipped with the Euclidean metric, methods for calcu- 
lating the Stiefel logarithm are introduced in [25]. 


7.4.5 The Grassmann manifold 


This section is devoted to the Grassmann manifold Gr(n, r) of r-dimensional subspaces 
of R” for r < n. Every point U € Gr(n,r), i.e., every subspace may be represented by 
selecting a basis {u!,...,u"} with ran(u!,...,u”) = U. In numerical schemes, we work 
exclusively with orthonormal bases. In this way, points U on the Grassmann manifold 
are to be represented by points U € St(n,r) on the Stiefel manifold via U = ran(U). For 
details and theoretical background, see the references [2, 3, 36]. Modeling data on the 
Grassmann manifold corresponds to data processing for subspaces and thus allows, 
for example, for the interpolation/parameterization of POD subspaces see [19, Chap- 
ter 5], [19, Chapter 9]. The most important use case in model reduction is where the 
subspaces are of low dimension when compared to the surrounding state space, i.e., 
n > p. Grassmann interpolation problems in the context of projection-based paramet- 
ric model reduction are considered in [8, 76, 100, 87]. Subspaces also feature in Krylov 
subspace approaches; see [20, Chapter 3]. 


7.4.5.1 Introduction and data representation in numerical schemes 


The set of all r-dimensional subspaces U c R” forms the Grassmann manifold 


Gr(n,r) := {U c R” | U subspace, dim(/) = r}. 
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The Grassmann manifold is a quotient of O(n) under the action of the Lie subgroup 
O(r) x O(n - r) = {(3 8) | S € O(r),R € O(n - r)} < O(n). Two matrices Q,0 € O(n) are 
in the same (O(r) x O(n — r))-orbit, if and only if the first r columns of Q and Q span 
the same subspace and the tailing n — r columns span the corresponding orthogonal 
complement subspace. Theorem 7.5 applies and shows that Gr(n,r) = O(n)/(O(r) x 
O(n — r)) is a homogeneous manifold. 

Alternatively, the Grassmann manifold can be realized as a quotient manifold of 
the Stiefel manifold with the help of Theorem 7.4, 


Gr(n,r) = St(n,r)/O(r) = {{U] | U € St(n,r)}, (7.30) 


where the O(r)-orbits are [U] = {UR | R € O(r)}. A matrix U € St(n,r) is called a matrix 
representative ofa subspace U € Gr(n, r), ifU = ran(U). The orbit [U] and the subspace 
U = ran(U) are to be considered as the same object. For any matrix representative 
U € St(n,r) of U € Gr(n,r) the tangent space of Gr(n, r) at U is represented by 


T,Gr(n,r) = {Ae R™ | UTA = 0} c R”. 
Every tangent vector A € T,,Gr(n,r) may be written as 


A=(I-UU')T, T «€ R” arbitrary, or, (731) 
A=U'B, Be R””” arbitrary, (732) 


where in the latter case, Ut € St(n,n — r) is such that (U,U*) € O(n) is a square 
orthogonal matrix. The dimension of both T,,Gr(n, r) and Gr(n, r) is nr - r. 


7.4.5.2 Distances and geodesics 


A metric on T; Gr(n, r) can be obtained via making use of the fact that the Grassman- 
nian is a quotient of the Stiefel manifold. Alternatively, one can restrict the standard 
inner matrix product (A, B), = trace(A TB) to the Grassmann tangent space. In the case 
of the Grassmannian, the two approaches lead to the same metric 


(A, A)y = trace(A’A) = (A, A)o; 


see [36, §2.5]. 


The Grassmann exponential 

The Riemannian Grassmann exponential at a base point U € Gr(n,r) sends a Grass- 
mann tangent vector A to the endpoint 4 € Gr(n,r) of a geodesic that starts from U 
with velocity vector A. Therefore, it provides at the same time an expression for the 
geodesic curves on Gr(n,r). An efficient algorithm for computing the Grassmann ex- 
ponential was derived in [36, §2.5.1]: 
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Algorithm 7.9: Grassmann exponential [36]. 


Input: base point U = [U] € Gr(n, r), where U € St(n,r), tangent vector A € TyGr(n, r) 
1: QEVT SWD A, with Q e€ St(n,r) {(thin) SVD of tangent vector} 
2: U := UV cos(2)V! + Qsin(Z)V" {cos and sin act only on diag. entries.} 

Output: 4 := Exp{7(A) = [0] € Gr(n,r). 


Apart from elementary matrix multiplications, the algorithm requires one to compute 
the singular value decomposition of an n-by-r-matrix. The computational effort is 
O(nr’). 


The Grassmann logarithm 
The Riemannian Grassmann logarithm at a base point U € Gr(n,r) finds for another 
point ŭ € Gr(n,r) a Grassmann tangent vector A such that the geodesic that starts 


from U with velocity A reaches 4 after an arc length of ||Allų = Vag (A, A). Therefore, it 

provides for two given data points U, Ù € Gr(n,r) 

- a solution to the geodesic endpoint problem: a geodesic that starts from U and 
ends at 4; 

— the Riemannian distance between the given points U, Ù. 


An algorithm for computing the Grassmann logarithm is stated implicitly in [2, §3.8, 
p. 210]. The reference [40] features expressions for the Grassmann exponential and 
the corresponding logarithm that formally work with Grassmann representatives in 
SO(n)/(SO(r) x SO(n —r)) but also keep the computational effort O(nr’). Reference [82, 
§4.3] gives the corresponding mappings after identifying subspaces with orthoprojec- 
tors; see also [16]. 


Algorithm 7.10: Grassmann Logarithm. 
Input: base point U = [U] € Gr(n,r) with U e€ St(n,r), Ü = [U] € Gr(n,r) with U e 


St(n,r). 
1: M:=U'U 
2: L:= (I - UU')UM" = UM'!-U 
3: QVI `P L {(thin) SVD } 
4: A := Qarctan(£) VT {arctan acts only on diag. entries.} 


Output: A = Log® (Ù) € TyGr(n,r) 


The composition Exp o Logh is the identity on Gr(n, r), wherever it is defined. Yet, 
on the level of the actual matrix representatives, the operation 


(Expr, ° Logfin)(lÜn]) = lout] 
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produces a matrix U,,,, + Uj». Directly recovering the input matrix can be achieved 
via a Procrustes-type preprocessing step, where U is replaced with U, := U®, ® = 
arg MiNgeg |U - UO]. This leads to the following. 


Algorithm 7.11: Grassmann Logarithm: modified version.”° 
Input: base point U = [U] € Gr(n,r) with U e€ St(n,r), ú = [U] € Gr(n,r) with U e 
St(n, r). 
: WSR? P Ty 


2; U, — U(PR’) {‘Transition to Procrustes representative’} 
3: L:= (I- UU" )U, 

7 SVD ; 
4: QEV ~= L {(thin) SVD} 
5: A:=Q arcsin(=)V7 {arcsin acts only on diagonal entries.} 


Output: A = Log? (ü) € T,,Gr(n,r) 


An additional advantage of the modified Grassmann logarithm is that the matrix in- 
version M~! = (UTŪ)™! is avoided. In fact, it is replaced by the SVD WSR? = ŬTU that 
is used to solve the Procrustes problem ming, ||U - U®|. The SVD exists also if U7U 
does not have full rank. 


Distances between subspaces 

The Riemannian logarithm provides the distance between two subspaces U = [U],U = 
[(U] e Gr(nyr) as follows: First, compute A = Log® (ù), then compute |All, = 
distg,(U,U). In practice, however, this boils down to computing the singular val- 
ues of the matrix M = u'U, which can be seen as follows. By Algorithm 7.11, 
Al, = trace(A7A) = ri arcsin(op)?, where the o;’s are the singular values of 
L = (I - UUT)Ŭ,. These match precisely the square roots of the eigenvalues of L’L. 
Using the SVD of the square matrix ŬTU = WSR’ as in steps 1&2 of Algorithm 7.11, the 
eigenvalues of L’L can be read off from 


LL = U7 (1- uu")U, = I - RS°R' = R(I-S’)R’, 


so that a =1- ne when consistently ordered. As a consequence, Sg = \! — o? = 
cos(arcsin(0;)), which implies 
p > 7p 2 
dist (U, ú) = (3 arcsin(oy = (3 accoss,} A (7.33) 
k=1 k=1 
where 0}, ..., 0, and s4, .. ., S, are the singular values of L and ŬTU, respectively. 
The numerical linear algebra literature knows a variety of distance measures for 
subspaces. Essentially, all of them are based on the principal angles [36, §2.5.1, §4.3]. 


20 This is an original contribution to this chapter; for a detailed discussion, see [17, Section 5.2]. 
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The principal angles (or canonical angles) 6,,...,0, € [0, 7] between two subspaces 
[U], [Ŭ] € Gr(n, r) are defined recursively by 


cos(6;,) := ulv = max max u'v. 


u € [U], lul =1 ve [U], lvi =1 
ULUy,...,UK_4 VLVį,..., Vk-1 


The principal angles can be computed via 6, := arccos(sķ) € [0, ap where sx is the 
kth singular value of U'U € R?“ [42, 86.4.3]. Hence, the Riemannian subspace dis- 
tance (7.33) expressed in terms of the principal angles is precisely 


dist({U], [Ŭ]) := lOl, © =(6,...,6,) € R”. (7.34) 


In particular, (7.34) shows that any two points on Gr(n,r) can be connected by a 
geodesic of length at most va; see also [98, Thm 8(b)]. 


7.5 Conclusion 


Interpolation of structured matrices is a viable building block in parametric model re- 
duction approaches. In order to preserve the characteristic features, the matrix sets 
in question are considered as geometric entities, so-called differentiable manifolds. 
In this chapter, we exposed how concepts from Riemannian geometry apply in de- 
signing manifold counterparts to Euclidean interpolation algorithms. As examples, 
the generic approach of interpolating in Riemannian normal coordinates, the quasi- 
linear, geodesic interpolation method and interpolation via the Riemannian center of 
mass are discussed. All the aforementioned methods share many of their constituent 
algorithmic units and acquaintance with these units allows one to adapt and modify 
the established approaches as needed or to design new ones. In this spirit, for a se- 
lection of matrix manifolds that feature frequently in practical applications, namely, 
the general linear group, the orthogonal group, the set of symmetric positive definite 
matrices, the Stiefel manifold and the Grassmann manifold, we have gathered the es- 
sential geometric concepts and formulas necessary to conduct Riemannian interpola- 
tion. 


Appendix A 


The matrix exponential and logarithm 
The standard matrix exponential and matrix logarithm are defined via the power se- 
ries 


expa(X) =X, loga = Yay SD. (735) 
fro jal J 
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For X € R™", exp,,(X) is invertible with inverse exp,,(—X). The following restrictions 
of the exponential map are important: 


EXP Isym(n) : S¥M(N) > SPD(n),  EXPm Iskewin) : Skew(n) > SO(n). 


The former is a diffeomorphism [79, Thm. 2.8], the latter is a differentiable, surjective 
map [41, §. 3.11, Thm. 9]. For additional properties and efficient methods for numerical 
computation, see [50, §10, 11]. 

A few properties of the exponential function for real or complex numbers carry 
over to the matrix exponential. However, since matrices do not commute, the standard 
exponential law is replaced by 


EXP (Z(X, Y)) = €xPm(X) EXP (Y), (7.36) 
Z(X,Y)=X+Y+ 5XY1 


1 1 
+ pl% [X, Y]] + [Y, [Y,X]]) - mY [X, [X, Y]]]..., 


where [X, Y] = XY -YX is the commutator bracket, or Lie bracket. This is Dynkin’s for- 
mula for the Baker-Campbell-Hausdorff series; see [86, §1.3, p. 22]. From a theoretical 
point of view, it is important that all terms in this series can be expressed in terms of 
the Lie bracket. A special case is 


EXP (X + Y) = exp, (X) exp, (Y), if [X,Y] = 0. 


Matrix square roots and the polar decomposition 

Every S € SPD(n) has a unique matrix square root in SPD(n), i. e., a matrix denoted by 
S 2 with the property S 253 = S. This square root can be obtained via an EVD S = QAQT 
by setting 


Il 
© 
= 
Q 
J 


where Q € O(n), A = diag(A,,...,A,) and A; > O are the eigenvalues of S. Every 
A € GL(n) can be uniquely decomposed into an orthogonal matrix times a symmet- 
ric positive definite matrix, 


A = QP = Qexp,,(X), Q € O(n),P € SPD(n),X € sym(n). 


The polar factors can be constructed via taking the square root of the assuredly posi- 
tive definite matrix A’ A and subsequently setting P := (A TA) and Q := AP™!. Because 
the restriction of exp,, to the symmetric matrices is a diffeomorphism onto SPD(n), 
there is a unique X € sym(n) with P = exp,,,(X). For details, see [46, Thm. 2.18]. 
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The Procrustes problem 
Let A, B € R”*’. The Procrustes problem aims at finding an orthogonal transformation 
R* € O(r) such that R* is the minimizer of 


min ||A — BR\lp. 
an I llr 


The optimal R* is R* = UV’, where BTA SYP usv? e RY ; see [42]. 
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8 Vector fitting 


Abstract: We introduce the Vector Fitting algorithm for the creation of reduced or- 
der models from the sampled response of a linear time-invariant system. This data- 
driven approach to reduction is particularly useful when the system under modeling 
is known only through experimental measurements. The theory behind Vector Fitting 
is presented for single- and multiple-input systems, together with numerical details, 
pseudo-codes, and an open-source implementation [75]. We discuss how the reduced 
model can be made stable and converted to a variety of forms for use in virtually any 
modeling context. Finally, we survey recent extensions of the Vector Fitting algorithm 
geared towards time domain, parametric and distributed systems modeling. 


Keywords: vector fitting, data-driven modeling, macromodeling, stability, passivity 


MSC 2010: 65D10, 37M05, 93C80, 94C99 


8.1 Introduction and motivation 


The Vector Fitting (VF) algorithm [42, 35] is one of the most successful techniques for 
creating reduced order models for linear systems starting from samples of their re- 
sponse. Samples may originate from an experimental measurement or from a prior 
numerical simulation. This need arises in many practical scenarios, and we cite two 
examples. 

A biomedical engineer may need a linear model describing blood flowin a portion 
of the human cardiovascular system, and have simultaneous in-vivo measurements of 
pressure and flow rate at the inlets and outlets of the region of interest. With a data- 
driven algorithm for model order reduction, such as VF, the reduced model can be 
created directly from experimental observations. 

As a second example, we consider an electronic engineer that needs a model for a 
radio-frequency amplifier or an antenna, to be used for design purposes. If the device 
is provided by a third party, a measurement may be the only way to characterize the 
system. High-frequency measurements are typically performed in the frequency do- 
main, and return the impedance or admittance seen between the ports of the device, 
measured at various frequencies w,. From these samples, VF can create a reduced 
model which can be represented as a set of differential equations or as an equivalent 
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circuit for use in subsequent simulation, including those performed in the time do- 
main. 

The main advantage of a data-driven approach to reduced order modeling is that 
only samples of the system response are required. This feature makes data-driven re- 
duction a natural choice when experimental measurements are readily available. Fur- 
thermore, data-driven reduction can also be applied when samples originate from a 
numerical simulation based on first-principles equations, such as Maxwell’s equa- 
tions for electromagnetic phenomena. Although in this second scenario one could 
technically use equation-driven methods, the available simulator may not allow the 
user to export the discretized first-principles equations for reduction. This is the case 
for most commercial simulators used by industry. The main disadvantage of data- 
driven reduction is that it offers less physical insight into the system under modeling, 
since it leads to a “black-box” reduced model. By starting from a first-principles model, 
equation-driven methods are typically better in this regard, since they can provide to 
the user more information about which features of the original model were retained, 
and which features were discarded. 


8.2 The Sanathanan-Koerner algorithm 


8.2.1 Problem statement 


We assume that the system under modeling is linear and time-invariant, with input 
u(t) < R” and output y(t) € Ri. Because of linearity and time-invariance, the output 
can be written as a convolution 
+00 
y(t) = | h(t — T)u(t) dt (8.1) 


between input u(t) and the impulse response h(t) € RR?” of the system, which is un- 
known. Applying the Laplace transform to both sides of (8.1), we get 
Y(s) = H(s)U(s), (8.2) 


where s = ø + jw is complex frequency. In (8.2), U(s) € C” and Y(s) € C? are the 
Laplace transforms of u(t) and y(t), respectively, and H(s) € C?" is the transfer func- 
tion of the system. The VF algorithm solves the following problem. Given k measure- 
ments of the transfer function 

Hy =H(jw,) k=1,...,k, (8.3) 
determine a rational function H(s) that approximates the given measurements 


H(jw,) =~ Hy Yk=1,...,k. (8.4) 
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In VE, H(s) is chosen to be a rational function. Rational functions are universal approx- 
imators, and can therefore approximate a wide range of functions with arbitrary accu- 
racy. Moreover, since the transfer function of lumped systems is rational by construc- 
tion, this is a natural choice to model dynamical systems. Finally, rational functions 
can be represented as a state-space system, a poles-residue form, a set of differential 
equations, an equivalent electric circuit and many other forms. This flexibility facil- 
itates the integration of the reduced model into existing software for computational 
mathematics and system simulation. 


8.2.2 The Levy and Sanathanan-Koerner algorithms 


The first attempts to solve (8.4) numerically date back at least to the 1950s, with the 
work of Levy, Sanathanan and Koerner among others. We briefly summarize their work 
since the VF algorithm can be better understood from that perspective. For simplicity, 
we initially consider the case of a system with a single input and a single output (îm = 
q = 1). The general case will be discussed in Section 8.3.5. 

In order to solve the approximation problem (8.4), we must first choose a suitable 
parametric form for H(s), which is the model that we want to estimate from the given 
samples. The most natural choice is to let H(s) be the ratio of two polynomials 


ñ 
= n(s) = diet ans” 


= ; 8.5 
d(s) Sno b,s" 


where ap, bn € R are unknowns, and ñ is the order of the desired model. Since one 
coefficient can be normalized, we let b, = 1. In (8.5), we chose the same degree ñ for 
numerator and denominator. This choice is appropriate for transfer functions that are 
known to be bounded when s — oo. This is the case of the scattering coefficients used 
to model electronic devices at high frequencies, as in the example in Section 8.3.7. 
In other applications, the transfer function of the system under modeling may grow 
polynomially as s increases. This is the case, for example, of the impedance and ad- 
mittance coefficients of passive electrical circuits, which can grow linearly with s. As 
an example, one can consider the impedance Z(s) = sL of an inductor. In such cases, 
the degree of the numerator of (8.5) should be increased to ñ + 1. This change leads 
to minor modifications to the algorithms presented in this chapter, which will not be 
discussed here, but can be found in [35]. 

After choosing the form of model (8.5), we have to determine its coefficients a, 
and b, in order to satisfy (8.4), minimizing a suitable norm between samples H, and 
model response H(jw,,). We choose the L norm, and aim to minimize 


e = 


aR 


k 
> |Ak - Fyw. (8.6) 
k=1 
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Minimizing (8.6) is a nonlinear least squares problem, due to the unknowns b,, in 
the denominator. Although nonlinear optimization algorithms can be directly applied 
to (8.6), experience shows that they can be quite time consuming and prone to local 
minima. A different approach is preferred, where (8.6) is linearized into a linear least 
squares problem, which can be solved efficiently and robustly with the QR decompo- 
sition [28]. 

We first rewrite (8.6) as 


ne hoe i 
eos), 


k=1 


Hy X n-o Dy sy)" — Eho An Oy)" 
Ynz0 Pn( JW)" 


(8.7) 


Levy proposed to linearize (8.7) by simply neglecting the denominator, and mini- 
mize [54] 


2 
; (8.8) 


k 
(e,)? = z 5 


n n 
Hy È b Gw” - Y} aw)" 
k=1 n=0 


n=0 


which ultimately boils down to solving a system of linear equations in least squares 
sense. Unfortunately, this simple trick typically fails to provide an accurate solu- 
tion of (8.4). Indeed, error functionals (8.7) and (8.8) are equivalent only when 
Shi b,(jw)" is approximately constant, which is rarely the case. Furthermore, the 
monomial terms (jw)" in (8.8) will result in Vandermonde matrices in the least-squares 
problem to be solved, which are ill-conditioned [28]. 

To overcome this issue, Sanathanan and Koerner proposed an iterative process to 
improve the quality of the solution [69]. In the first iteration (i = 1), the Levy func- 
tional (8.8) is minimized, providing a first estimate of the model coefficients that we 
denote as a” and b®., In successive iterations (i > 2), the following linearization 
of (8.7) is minimized: 


k 


)\2 1 
(esx) ==> 
kia 


Hy X7 o BO (ay) - Zo a Uw] 


- - ; (8.9) 
Eho bP Uw" 


leading to a new estimate of model coefficients a® and bÒ, We can see that, in (8.9), 
the coefficients pe from the previous iteration are used to approximate the “nonlin- 
ear” term in (8.7). Since the unknowns a) and b® appear only in the numerator, the 
Sanathanan-Koerner method only requires the solution of linear least squares prob- 
lems. If the iterative process converges, pe > bË, and (8.9) becomes equivalent 
to (8.7). We can see that the term Sto pen jw,)" in the denominator of (8.9) acts as 
a frequency-dependent weight of the least squares problem. This weight aims to pro- 
gressively remove the bias introduced in the linearization of (8.6). For discrete-time 
systems, the counterpart of the Sanathanan—Koerner method was proposed by Stei- 
glitz and McBride [73]. 
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8.2.3 Numerical issues of the Sanathanan—Koerner method 


The work of Sanathanan and Koerner solves (8.4) accurately using only linear least 
squares problems. Unfortunately, this method can still suffer from severe numerical 
issues when applied to realistic problems, where the required model order ñ may be 
quite large and frequency w may span several orders of magnitude. For example, VF is 
extensively used in integrated circuit design to model the interconnect network that 
distributes signals and power across the circuit. In this application, the frequency 
range of interest typically extends from a few MHz to tens of GHz, for about four 
decades of variation. The numerical issues associated with the Sanathanan—Koerner 
method arise from two factors: 

(a) The error (8.9) contains high powers of frequency (w,)", leading to very poor con- 
ditioning. Specifically, the matrix of the least-squares problem to be solved will 
contain Vandermonde blocks [28], which are known to be ill-conditioned even 
for relatively modest values of ñ. 

(b) The weighting term Tig pË» (jw;,)" in the denominator of (8.9) typically exhibits 
large variations over [W,, w;], which further degrade the conditioning of the least 
squares problem. 


8.3 The Vector Fitting algorithm 


The VF algorithm, conceived by Gustavsen and Semlyen [42], addresses both problems 
with a simple yet brilliant solution. 


8.3.1 A new basis function and implicit weighting 


In order to avoid the ill-conditioning arising from the s” terms in (8.5), VF replaces 
those terms with partial fractions. The numerator and denominator of H(s) are written 
as 


®_ 0, ef 
n? = c$ * 2 9" (8.10) 
© 5 dy 
d®? =1+ Y — (8.11) 
n=1 S -p® 


where po € C are a set of initial poles, whose choice will be discussed later on. We 
see that, without loss of generality, the constant term in (8.11) has been normalized 
to one. In comparison to the monomial basis functions s” used by the Sanathanan- 
Koerner iteration, which vary wildly as s increases, partial fractions = have more 


contained variations over frequency if the poles p® are chosen appropriately [43], as 
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will be discussed in Section 8.3.2. This feature leads to better conditioning, especially 
if the poles po are distinct and well separated. 

The introduction of partial fractions is also crucial to address the second issue 
discussed in Section 8.2.3, and perform an implicit weighting of (8.9). To understand 
how VF achieves this, we first give a different interpretation to linearized error (8.9). 
In terms of (8.10) and (8.11), error (8.9) can be expressed as 


k © (C) 2 
(i) \2 1 d (Jwg) n (Jwg) 
e =- Hg — - (8.12) 
(50) = p Agw agw 
We can see that this expression involves the two new quantities 
(i) 
Orn a(S) 
w™ (s) IDo (8.13) 
and 
(i) 
si), n (s) 
H(s) aR) (8.14) 


The function H® (s) can be interpreted as the model transfer function estimated at it- 
eration i by the minimization of (8.12). Notably, this transfer function is made by the 
numerator n (s) from the current iteration (to be found), and by the denominator 
da“ (s) from the previous iteration (already known). This approximation arises from 
the linearization of the error function, since it indeed avoids the presence of unknowns 
in the denominator. The function w® (s) can be interpreted as a frequency-dependent 
weight which multiplies the given samples H,. This weighting function has two pur- 
poses: 
— providing a new estimate of denominator d® (s), and 
— compensating for the approximation introduced by fixing the denominator of 
Ho (s) to the previous iteration value. Indeed, weight w® (s) depends on the ratio 
between new denominator estimate d® (s) and previous estimate d=») (s). 


Next, we derive alternative expressions for w® (s) and Ħ® (s), which pave the way for 
an implicit weighting of (8.12). Substituting (8.10) and (8.11) into (8.13) we can derive 
the following chain of equalities: 


z (i) pù 
1 y’ da Il(s-p,,’) 


A + Ln=1 0 XC) ni wi) 
w(s) = s Pr _ It Pr ae 14 5 na, (8.15) 
a dai? Ts-p) (1) 
1+), ae An n=1 S — Pn 
mel s-p®  TIs-pP) 


where [| = Ma In (8.15), p® are the zeros of d® (s), and therefore the poles of 
H (1) (5), From the second expression in (8.15), we see that w® (s) is the ratio of two 
rational functions with the same poles p® . By factorizing their respective numerators 
and denominators, as in the third expression, we observe that those common poles 
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can be eliminated. Finally, we express w in terms of a new set of poles pe d that 
change at every iteration, as in the last expression in (8.15). The same manipulation 
can be performed on Ħ® (s), leading to 


cy i Ah ñ (i) 

(poy n=} s-Pr” _ (i) Th 

H (s) i qe = Yo ag $ s pD (8.16) 
1+ Èn 90 n=1 $ — Pn 


Substituting (8.15) and (8.16) into (8.12), we obtain [42, 35] 


a-tao a) (e—a) 
= ) |H,| 1+ ) —— + 
ke n=1 JWx pe n=1 JWk 7 pe? 
which is the actual error function used in VF to fit the model to the given samples. 


The main difference between (8.17) and (8.9) is how the linearized ena is iteratively 
weighted to progressively converge to (8.6). In (8.12), the weight = is applied 


2 
f (8.17) 


(i z JW 
explicitly, which degrades numerical conditioning. In (8.17), instead, athe’ same weight 
is applied implicitly by relocating the poles pt» D at each iteration. 

Once (8.17) has been minimized, the updated poles p® for the next iteration can 
be found as the zeros of d® (s), as one can see from the third expression in (8.15). It 
can be shown that such zeros can be calculated as the eigenvalues of a matrix [42] 


{pË} = eig(d®™® - b,(c®)’), (8.18) 


with A“) = diagip}”,...,p®”} being a diagonal matrix formed by poles pë” 
In (8.18) by is a ñ x 1 vector of ones, and (cy? = [w®, out w}, Upon convergence, 
pe» > p®, and they become the poles of the obtained model H® (s). When this 
happens, w® — 1 as we can see from the third expression in (8.15), and the linearized 


error (8.12) tends to (8.6), as desired. 


8.3.2 The Vector Fitting algorithm 


We are now ready to present the complete VF algorithm [42, 35], with a pseudo-code 
implementation available in Algorithm 8.1. The first step is to choose the order n of 
the desired model. This choice will be discussed in Section 8.3.11.1. Next, we set the 
initial poles p® of the basis functions in (8.16) and (8.15). Numerical tests [42] showed 
that a linear distribution of poles with small and negative real part over the bandwidth 
spanned by samples H, leads to the best conditioning of the least squares problems 
to be solved. We assume ñ even, and frequency values wx sorted in ascending order. 
If w = 0, the initial poles can be set as [35] 


f =1,...,n/2 
ey a tn forn n/ eas) 


n PP) forn = ñ/2+1,...,ïñ 
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Algorithm 8.1: Vector Fitting. 


Require: response samples H,, corresponding frequencies wx (k = 1,...,k) 
Require: desired model order n 
Require: maximum number of iterations imax 

1: set initial poles p® according to (8.19) or (8.20). 


2 i1 
3: while i < imax do 
4: Solve (8.21) or (8.38) in least squares sense 
5 Compute the new poles estimate p° with (8.18) 
6 Enforce poles stability with (8.71), if desired > Stability enforcement 
7: if (8.27) is true then > First convergence test 
8 Solve (8.29) or (8.40) in least squares sense > Tentative final fitting 
9 Compute fitting error e with (8.6) or (8.34) 
10: if e < £y then > Second convergence test 
1: H(s) = H%(s) 
12: return Success! 
13: end if 
14: end if 
15: icit+l 


16: end while 
17: return Failure: maximum number of iterations reached. 


where * denotes the complex conjugate and a is typically set to 0.01. This rule gen- 
erates n/2 pairs of complex conjugate poles, linearly distributed over the frequency 
range [0, wz] spanned by samples Hx. The imaginary part of the poles is set to be quite 
larger than the real part, since this makes the partial fraction basis functions well dis- 
tinct from each other, which improves numerical conditioning. 

When w; + 0, the distribution (8.19) can be modified as [35] 


(-a + lw + "1 (n-1)] forn=1,...,f/2, 
Ph =} o bee (8.20) 
POD" forn =ñ/2+1,...,ñ, 


to linearly spread the poles between w = w, and w = wz. Rules (8.19) and (8.20) work 
well for most cases, since the choice of initial poles is typically not critical for VF con- 
vergence. When the frequency range of interest spans several decades, and the system 
frequency response exhibits significant behavior in multiple decades, initial poles can 
be distributed logarithmically for optimal results [35]. 

The core of the VF algorithm is an iterative minimization of (8.17), which begins 
with i = 1. Minimizing (8.17) is equivalent to solving, in the least-squares sense, the 
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system of equations 
© (i c} 
l l n 
[o? Dye | P = Vy (8.21) 


where pË and p? contain the partial fraction basis functions evaluated at the differ- 
ent frequency points wg: 


1 1 
japi? U jop” 
p? -= |: i f (8.22) 
1 1 
PP Ga? 
1 1 
jwp” jwp” 
i is š 
=| : z. i (8.23) 
PES E ees 
Jopi” ` jogp” 


and Dy = diag{H,,...,H;}. The right hand side of (8.21) is a column vector formed by 
the given samples 


Va=(H, ... Hg] (8.24) 
while c® and c® contain the unknown coefficients 


5 h ew i 
CS ate | (8.25) 


. m AoT 
a = [we a w®] S (8.26) 


System (8.21) can be solved in the least-squares sense with a QR decomposition of 
the coefficient matrix [28]. Once (8.21) has been solved, the new poles estimate p® is 
computed with (8.18). 

The VF iterative process usually converges very quickly, often in 4—5 iterations, 
except when the given samples are noisy. The fast and reliable convergence of VF is 
truly remarkable considering that VF ultimately solves a nonlinear minimization prob- 
lem. Unfortunately, so far no one has been able to support this experimental evidence 
with strong theoretical results on VF convergence. Actually, contrived examples show 
that VF convergence is not guaranteed [53, 72]. However, these examples are quite ar- 
tificial and far from practical datasets. Two decades of widespread use indeed show 
that, when properly implemented, VF is a remarkably robust algorithm for the iden- 
tification of reduced order models from sampled data. In VF, convergence is typically 
monitored with three conditions: 
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1. When the poles estimates stabilizes, i.e. po = pe, performing new iterations 


will not improve accuracy. When this happens, w" ( jw) = 1 for w € [w,;]. This 
occurrence can be tested numerically as 


k 

1 S wgw) - 1? < en, (8.27) 

k a 
where €,, is a user-defined threshold. The advantage of criterion (8.27) is that it 
does not require additional computations apart from the calculation of the norm 
in (8.27). The limitation of this test is that it is based on an empirical condition, 
which may be satisfied even when the reduced model H © (s) still does not fit well 
the given frequency samples. Conversely, when samples H% are noisy, test (8.27) 
may fail even when additional iterations will not significantly improve the poles 
estimate [33]. Therefore, test (8.27) should be only used as a low-cost check to de- 
cide whether it is worth to compute the error between the reduced model response 
and samples H,. 

2. When condition (8.27) is satisfied, the error between the fitted model and sam- 
ples H, should be checked. In principle, this can be done by computing the error 
between (8.16) and H;,. However, since after solving (8.21) a new estimate of the 
poles can be found via (8.18), the common practice is to use those poles to fit a 
new model. This is done by minimizing the exact error (8.6) between the given 
samples H and model 

(i+) (i+1) Bont) 
HO" (8) = 1)" +) > (8.28) 
n=1 S — Pn 


(i+1) 


considering only residues r, er oc as unknowns. Since poles p? are now 


fixed, this is equivalent to solve, in least squares sense, the linear system 
Oe = Vy. (8.29) 
The VF iteration ends, successfully, when 
e< Ep, (8.30) 


since model (8.28) meets the accuracy threshold £y set by the user. The main 
reason why this additional fitting step is performed is because this step mini- 
mizes the exact error (8.6) between model and given samples, rather than a linear 
approximation like (8.17), which improves accuracy and more reliably detects 
convergence. Therefore, solving (8.29) serves both as convergence test and as 
final fitting of the model. 

3. In selected circumstances, VF may be unable to reach (8.30) even after many iter- 
ations. In this case, the iterative process concludes unsuccessfully when i exceeds 
the maximum number of iterations imax allowed by the user. 
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8.3.3 Example: fitting a rational transfer function 


In this example, we apply VF to a set of samples H; generated from a known ratio- 
nal function of order 10. Its poles were generated randomly, and are reported in Ta- 
ble 8.1. The original transfer function was sampled at k = 100 frequency points lin- 
early spaced between w, = 0.1rad/s and wig, = 10 rad/s. A Matlab implementation of 
VF [75] was used to fit the samples with a model in the form (8.16) with order ñ = 10. 
The initial distribution of poles p® set by (8.20) is depicted in the left panel of Fig- 
ure 8.1. Throughout the VF iterations, poles relocate to the final distribution shown 
in the right panel of Figure 8.1, which also compares them to the exact poles of the 
original rational function. We can see that the poles estimated by VF closely match 
the poles of the original system. 


Table 8.1: Example of Section 8.3.3: poles and residues of the transfer function used to generate 
samples H,. 


Pole Residue 

constant term ro = 0.1059 

pı = -1.3578 rı = -0.2808 

Pz = -1.2679 ry = 0.1166 

P34 = -1.4851 + 0.2443, r3 4 = 0.9569 F 0.7639) 

Ps, = —0.8487 + 2.9019) Fs e = 0.9357 F 0.7593) 

P7, = —0.8587 + 3.1752] 17g = 0.4579 F 0.7406) 

P9,10 = —0.2497 + 6.53697 F910 = 0.2405 F 0.7437, 

1 
° i « Exact e 

ge 
& o Bee 
£ 


2 -1.5 -1 -05 0 
Re{p,} 


Figure 8.1: Left panel: initial poles p® used by VF in the first iteration. Right panel: poles of the final 


model H(s) compared to the exact poles of the original transfer function. 


In Figure 8.2, the frequency response H (jw) of the VF model is compared to the initial 
samples. We observe excellent agreement over the entire frequency range of interest. 
At the conclusion of the VF iterative process, the worst-case error between samples H;, 
and model response 


e% = max|Hy — H(jo,)| (8.31) 
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Figure 8.2: Example of Section 8.3.3: magnitude (top) and phase (bottom) of samples Hx and of the 
model A(jw) identified by VF. 


is 2.37 x 10 ““. Figure 8.3 shows the evolution of e,, throughout the five iterations per- 
formed by VF, plus a final iteration (i = 6) where poles were kept fixed and residues 
were calculated one more time using (8.29). The figure shows that VF converges very 
quickly, reaching an error below 10° in only three iterations. We can also observe that 
the final fitting iteration (i = 6) with fixed poles provides a more accurate model. For 
this example, VF took only 0.2 s of CPU time on a 2.2 GHz mobile processor. The source 
codes related to this example can be downloaded from [75]. 


8.3.4 Example: modeling of aortic input impedance 


In this example, VF is used to model the relation between pressure p(t) and flow 
rate q(t) in the ascending aorta of a 1.1-year old patient [71, patient 1]. Simultaneous 
pressure and flow rate measurements were collected during a surgical procedure. 
The blood flow rate was measured with an ultrasonic flow probe positioned about 
1cm downstream of the aortic valve. The pressure was acquired using a catheter 
with a pressure transducer on its tip, positioned in the same location as the flow rate 
probe. From time domain recordings, the input impedance seen from the aorta was 
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Figure 8.3: Example of Section 8.3.3: worst-case fitting error e, as a function of iteration counter /. 
The last iteration (i = 6) was performed with fixed poles. 


obtained: 


_ Fp} 
FOF 


Z(jw) (8.32) 


where .F{.} denotes the Fourier transform. Impedance was computed at k = 11 fre- 
quency points wp = 27 (k — 1)fp for k = 1,...,11, where fọ = 2.54 Hz = 152.4 beats/min 
corresponds to the heart rate of the patient. The authors of [71] estimate that the 
impedance measurements are affected by uncertainty with a relative standard de- 
viation that ranges between 0.66% to 14.5% depending on frequency. The relative 
standard deviation was normalized to |Z(0)|. 

We apply VF to the impedance samples to obtain a closed-form model relating 
aortic pressure and flow rate. The limited number of available samples and their un- 
certainty make the identification of an accurate model challenging. We use this non- 
trivial scenario to explore the relation between number and quality of the available 
samples, model order ñ, and accuracy. Vector Fitting was applied to the given samples 
four times with model order n increasing from 2 to 8 in steps of 2. Figure 8.4 compares 
the magnitude and phase of the identified model to the original impedance samples. 
We can see that the n = 2 model captures the overall trend of the impedance. How- 
ever, it fails to resolve the increase in impedance at f = 12.7 Hz and the associated 
phase variation. Increasing order to 4 or 6 resolves that feature and provides higher 
accuracy. Further increasing order ñ to 8 leads to a model which matches closely most 
given samples, but has a sharp and high peak at f = 12.3 Hz. This unrealistic behavior 
in-between the given samples is typical of an overfitting scenario, where the sought 
model has too many degrees of freedom, which can be hardly estimated from the infor- 
mation contained in the available samples. Although still solvable, the conditioning 
number of (8.21) degrades. The system solution, which gives the model coefficients, 
becomes very sensitive to the noise superimposed to the given samples. The source 
codes related to this example can be downloaded from [75]. 
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Figure 8.4: Impedance seen into the ascending aorta of the pediatric patient considered in Sec- 
tion 8.3.4: measured samples (circles) and response of four different VF models (dashed lines) of 
order ñ = 2, 4, 6, 8 (from top to bottom). 
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8.3.5 The multi-input multi-output case 


The VF algorithm presented in Section 8.3.2 for the single-input single-output case 
can be easily extended to the general case of a system with m inputs and q outputs. In 
this case, the given samples are g x m complex matrices H,, and we denote their (q, m) 
entry as Hx qm- The model transfer function is now defined as 


© 


1) 
sph 


n 
H(s) = RO + 2 (8.33) 
where RÖ € C?™, In (8.33), the same poles pt» d are used for all elements of matrix 
Hs). This choice is appropriate when modeling linear dynamical systems, since it is 
known that the poles of each transfer function entry are a subset of a common set of 
poles shared by all transfer function elements. The physical justification of this fact is 
that poles are related to the natural modes of the system, which are a property of the 
system itself and not of individual entries of its transfer function. In other communi- 
ties, natural modes are referred to as resonances or eigenmodes of the system. When 
VF is applied to model transfer functions not related to the same physical system, one 
should use distinct poles for different elements of (8.33). This scenario is discussed 
in [35], which also elaborates on the computational implications of this choice. 
In the multi-input multi-output case, weighting function w® (s) remains defined 
by (8.15). Since the transfer function (8.33) is now matrix-valued, VF aims to minimize 
the error functional 


a 


2 1 
e = — H,- H 
ign Dl 


(8.34) 


where ||.||- denotes the Frobenius norm, which for A € Cc?" is defined as 


q m 
Ally = 13 5? 2 Agml?. (8.35) 


From (8.35), we see that the square of the Frobenius norm is simply equal to the sum of 
the squared magnitudes of each entry. Therefore, minimizing (8.34) means minimiz- 
ing the sum of the squared error between each sample Hy, qm and the corresponding 
entry of (8.33). 

The minimization of (8.34) is a nonlinear least-squares problem, which VF solves 
iteratively by working on the linearized error [42] 


= : 2 


ni wi ( , ñ pRO 
H aC + ———— ]- RY + a __ 
Lise J®k -7 pe ? +2, JOK -— -pi p 


As in the single-input single-output case, we can see that (8.36) uses weighting func- 
tion w® (s) to offset the error introduced by using the previous poles estimate in the 


(0-7 


(8.36) 
kqm j= 


F 
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denominators. Minimizing (8.36) is equivalent to solving, in the least-squares sense, 
the system of equations 


(i 5 Rng 5 Wr 
Rog ——. -H —” =H (8.37) 
0, = k,qm E k,qm 
= Fe Ix -p$ 9 n=1 JWk -pf p 
fork =1,...,k,q=1,...,qgandm =1,...,m. In matrix form, equations (8.37) read 
(i) 
j z 
OP 0 ... O Dy)? | | “Ha V 
cË Ay 
Oi o : (i) 2 V 
o oË `>. i -Dp ® Ael ok (8.38) 
: o ere ps i 
G in D D Ti Viton 


where D Ham and Viton are, respectively, a diagonal matrix and a column vector formed 
by all samples Hk qm for k = 1,...,k. In the unknown vector of (8.38), 


© _ [pi © JT 
cr, [Rom = Rial + (8.39) 


and cl is defined by (8.26). System (8.38) is solved in step 4 of Algorithm 8.1. In step 8, 
a tentative final fitting of the model is performed, assuming fixed poles and determin- 
ing only a new estimate of residues RÈD, This step can be achieved by solving 


i+1) „(i+1 
pt e = Vyp (8.40) 


forq =1,...,.gandm =1,..., M. 


8.3.6 The fast Vector Fitting algorithm 


As the number of inputs m and outputs g increases, the computational cost of solv- 
ing (8.38) can quickly become unsustainable. As technology evolves, this scenario 
arises more frequently, as engineers need to model systems of increasing complex- 
ity, either in terms of dynamic order or number of inputs and outputs. For example, 
a modern server processor has about 2,000 pins, which are connected to the mother- 
board by a dense network of tiny wires realized on the chip package. Seen as an input- 
output system, this network will have about 4,000 inputs and outputs, half where the 
network connects to the motherboard, and half where the network is connected to the 
silicon die. The need to predict electromagnetic interference in this dense and intri- 
cate network of wires calls for scalable algorithms to create reduced order models for 
systems where the number of inputs m and outputs g can be several thousands [70, 8]. 

The Fast VF algorithm [22, 47] significantly reduces the cost of solving (8.38) for 
multi-input and multi-output systems. Savings are achieved by exploiting the block 
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structure of (8.38) and the fact that, of the solution vector of (8.38), only cf is actually 
needed to compute the new poles estimate (8.18). A least-squares problem in the form 
of (8.38) can be efficiently solved by first performing the QR decompositions [29, 9, 86, 
22] 


a D Ri RY 

i ))_[ol 2 qm qm 

[D Din Pa ] z [Sam Qam] | 0 RZ | i (8.41) 
qm 


forq = 1,...,ĝ and m = 1,..., m. Then a reduced system is formed [22]: 


2 2\H 
Ri (Qi) Vu, 
RZ p (22 EV 

F c = s Hai (8.42) 
Rom (Qaim) Viton 


where ” denotes the conjugate transpose, also known as Hermitian transpose. Sys- 
tem (8.42) is solved in the least-squares sense to determine c®, and compute the new 
poles estimate with (8.18). Computational savings arise from the fact that the size of 
the matrices involved in (8.41) and (8.42) is much lower than the size of the coefficient 
matrix in (8.38). Furthermore, since the gm QR decompositions (8.41) are indepen- 
dent, they can be performed in parallel [14]. The Fast VF algorithm with paralleliza- 
tion can identify reduced models for systems with hundreds of inputs and outputs in 
minutes [35]. A pseudo-code of a real-valued implementation of the Fast VF algorithm 
will be given in Section 8.3.8. 

Several other ideas were proposed to increase VF scalability for large input and 
output counts. In VF with compression, samples H, are “compressed” with a singular 
value decomposition reducing the cost of the subsequent fitting [37] and passivity en- 
forcement steps [63]. The Loewner method [52, 46], which is an alternative to VF for 
the data-driven modeling of linear systems, was also shown to scale favorably with 
respect to the number of inputs and outputs. This class of techniques is the subject of 
Chapter 6 of this volume. 


8.3.7 Example: modeling of a multiport interconnect on a printed 
circuit board 


Vector Fitting is extensively used by electronic designers to model how high-speed dig- 
ital signals propagate over a printed circuit board, and design the system accordingly. 
We consider the structure shown in Figure 8.5, which consists of several copper traces 
realized on the top face of a high-performance printed circuit board (Wild River Tech- 
nology CMP-28 [88]). This structure mimics, in a simplified way, the multiwire buses 
that may connect the CPU and memory of a high-performance server. At the end of 
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Figure 8.5: Interconnect network on a printed circuit board considered in Section 8.3.7. The four 
measurement ports of the vector network analyzer were connected as shown in the Figure. 


each trace, an electrical port is defined between the trace endpoint and a reference 
point on the ground plane underneath. The port is defined where the CPU or memory 
chip would be connected. In the test system, a high-frequency connector was installed 
at each port allowing the user to inject a signal from each port, and observe the signal 
received at the other ports. 

In this example, we consider the two lower traces in Figure 8.5, which have con- 
nectors J72, J71, J64, J61 soldered at their ends. The scattering matrix H(jw) of this 
4-port device was measured from 10 MHz to 40 GHz in steps of 10 MHz with a Keysight 
N5227A vector network analyzer (courtesy of Fadime Bekmambetova, University of 
Toronto). In the scattering representation, input U,,(jw) is the amplitude of the elec- 
tromagnetic wave injected into port m by the instrument. Output Y,(jw) is the ampli- 
tude of the wave received at port q. The scattering representation is commonly used at 
high frequency since it can be measured more accurately compared to the impedance 
or admittance representations used at low frequency. 

A commercial implementation of the VF algorithm (IdEM, Dassault Systemes) 
was used to generate a reduced order model from the measured samples (courtesy of 
Prof. Stefano Grivet-Talocia, Politecnico di Torino). Figure 8.6 compares the VF model 
response to the original samples for the (1, 2) element of the scattering matrix. This 
response is the ratio between the amplitude of the wave received at one end of the 
trace (port 1) and the amplitude of the wave injected at the other end (port 2). We see 
that, as frequency increases, the received signal is progressively weaker, due to higher 
attenuation. The agreement between the VF model and the samples is excellent over 
the entire frequency range spanned by the measured data. Figure 8.7 compares the 
model response to the measured samples for the (1,3) entry of the scattering matrix, 
which describes the signal received on the lower copper trace in Figure 8.5 when only 
the upper trace is excited. This coefficient is about 25 times smaller than the (1, 2) 
coefficient, since the two traces are not directly connected, and any coupling is due to 
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Figure 8.6: Example of Section 8.3.7: comparison between samples Hg 12 and corresponding VF 
model response. 
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Figure 8.7: Example of Section 8.3.7: comparison between samples H, 13 and corresponding VF 


model response. 


electromagnetic interference. We can see that the VF model approximates this small 


entry very accurately. 
Figure 8.8 plots the samples-model error e®, as a function of i, together with the 


order ñ used by VF at each iteration. In this example, the order is adapted throughout 
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Figure 8.8: Example of Section 8.3.7: VF error as a function of iteration, compared to the desired 
error level. Labels indicate the order A used by VF at each iteration. 


iterations with the adding and skimming process [33] described in Section 8.3.11.1. We 
observe that VF is able to progressively reduce the error throughout iterations, but 
convergence is slower than in the analytical example of Section 8.3.3. This happens 
because of two reasons. First, this implementation of VF adaptively determines order 
nina single run, without requiring the user to determine a suitable ñ with multiple VF 
runs. Second, some noise is unavoidably present in the experimental measurements, 
which slows down convergence, and prevents VF from reducing the fitting error below 
1073. Indeed, we can see that VF is unable to increase model accuracy after the 10th 
iteration. Ultimately, VF delivers a reduced model with an error of 1.34 - 1073, which is 
adequate for most design purposes. 


8.3.8 A real-valued formulation of VF and fast VF 


In most systems of practical interest, input u(t) and output y(t) are real-valued. Conse- 
quently, poles p,, and residues R,, are expected to be either real or in complex conjugate 
pairs. Because of round-off errors, the VF algorithm described so far may not ensure 
this realness condition. In this section, we describe a real-valued version of Fast VF 
which can be implemented in real arithmetics, and will ensure the realness condition 
by construction. The pseudo-code of the described algorithm is given in Algorithm 8.2. 
An open-source implementation of this algorithm, which closely follows the notation 
and pseudo-code in this chapter, can be downloaded from [75]. 
To ensure complex conjugate poles and residues, we redefine model (8.33) as 


ny, R® Arte RË (R®)* 
77 (i) (i) n 
H (s) = Ro toT ED +> ‘| ED ED | (8.43) 
S— Pn n=n,+1 S— Pn g ~ Wn ) 


where ñ, is the number of real poles and ñ, is the number of pairs of complex conjugate 
poles, for a total order ñ = ñ, + 2ñ,. In (8.43), we force R® e R®™™ forn = 0,...,ñ,. The 
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VF weighting function (8.15) is redefined in a similar fashion as 


wi Paks (i) (i) 
w (w) 
w® (s) = =1+ | n D (8.44) 
p a 1) uals — pe p (iD) 


where wl € Rforn = 1,...,n,. Using (8.43) and (8.44), and following the steps in 
Section 8.3.5, one can arrive at a least-squares system in the same form as (8.38) 


© © 7 | CH 

oP 0 ... 0 -Dp of o Vu, 
(i) 23 s (i) 21 V 
St De i Ul (8.45) 
: Ln 0 : cf ; 
@ © m V 
O  ... 0 @) -Dy,, of hi Hon 
Cw 


but where we take as unknowns the real and imaginary part of each residue in (8.43) 


RË m RefR Im{R (8.46) 


m a mn Ae 7 


and the real and imaginary part of each residue of the weighting function (8.44) 


4 P ; T 
e[u a wy) Rewia mw l (8.47) 


n, 


This choice of unknowns will ensure that complex residues always come in conjugate 
pairs. The coefficient matrices pË and pË in (8.45) are given by 


DË = [1 0 o], (8.48) 
0 =[o9 p], (8.49) 


where lz isa k x 1 vector of ones, and 


soia =i 
jwp? U jpg” 
p = : i l (8.50) 
SoL — 
Jop? aap” 
1 J 
JaPa Jo oD ne J0 oF a 
D0- . f (8.51) 
1 1 


J 
(i-1) (1) (i-1) (i-1) 
IDPs Jwr Pi) IVE —Pay ti JOR- Pi) 


Although (8.45) has real unknowns, its coefficients matrix and right hand side are still 
complex-valued. To remedy this issue, we write the real and imaginary part of each 
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equation separately 


(i) (i) 
Revo } o (0) -Re{Dy,, Pr } R Re{Vy,} 
Imo} o0 o -Im{Dy, ®{”} as Im{Vjz, } 
: : = : . (8.52) 
0... 0 RefO} -RefDy 0} CHop Re{Vy} 
A x C. 
Ò> au 0 y D o |” Im{ Vaa) 


The obtained system, which has real coefficients and unknowns will ensure, by con- 
struction, that model poles and residues are either real or complex conjugate. Due to 
its block structure, system (8.52) can be efficiently solved with the Fast VF approach 
discussed in Section 8.3.6. In step 5 of Algorithm 8.2, the QR decompositions 


11 12 11 12 
T 21 22 22 
Qam Qm 0 Ram 


Re{@)} -Re{Dy DP} 


(i) (i) (8.53) 
Im{®o'} -Im{Dy pP: } 


are computed for q = 1,...,qg and m = 1,..., m. Then, in step 5, the reduced system 


RZ (Q0) Re{Vy, } + (Q1) Im{Vp,} 
RA w (92) RelV a, } + (Qa) Im{Vy,} (8.54) 
Ram (OP)? Relig} + (OR IMEV 


is solved in the least-squares sense to determine cl ) and compute the new poles esti- 
mate with the real-valued counterpart of (8.18), which reads [35] 


{p®} = eig(A®™ - b,,(c)’), (8.55) 
with AT) = diagip®”, A p 1) un ae it oR } being a block diagonal matrix 


formed by the real poles and, for complex conjugate pairs, by the blocks 


<3 Re{p*?} Imp }) 
I, = -Im{p» 1) } Re{p®” 1) } 
n 


In (8.55), by is a ñ x 1 vector with the first ñ, entries set to one, followed by a [2, lg 
block for each pair of complex conjugate poles. 

Once poles have been estimated, a first convergence test is performed in step 8 
using (8.27). If the test is passed, in step 9 of Algorithm 8.2 we fit the residues of the 
final model, solving in the least-squares sense 


i+1 
bee ett) | 
Imo"; Ham Im{Vz,,,} 


(8.56) 


(8.57) 


for q = 1,...,qandm = 1,...,m. The second and final convergence test is performed 
in step 11 of Algorithm 8.2. 
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Algorithm 8.2: Fast Vector Fitting, real-valued implementation. 


Require: response samples H,, corresponding frequencies wy (k = 1,...,k) 
Require: desired model order n 
Require: maximum number of iterations imax 

1: set initial poles p® according to (8.19) or (8.20). 

2 i1 

3: while i < imax do 

4: Compute QR decompositions (8.53) 


5: Solve (8.54) in the least-squares sense 
6: Compute the new poles estimate p? with (8.55) 
7: Enforce poles stability with (8.71), if desired > Stability enforcement 
8: if (8.27) is true then > First convergence test 
9: Solve (8.57) in the least-squares sense > Tentative final fitting 
10: Compute fitting error e with (8.34) 
11: ife < £y then > Second convergence test 
12: H(s) = AH? (s) 
13: return Success! 
14: end if 
15: end if 
16: ici+l 


17: end while 
18: return Failure: maximum number of iterations reached. 


8.3.9 Model realization 


The real-valued formulation of VF, discussed in Section 8.3.8, produces a reduced 
model in the form 


n ñ+ * 

r, R 

+2 lee a], (8.58) 
S- Pn * S—p 


H(s) = 


which can be easily converted into a variety of equivalent representations to facilitate 
its use in different simulation scenarios. Expression (8.58) is known as pole-residue 
form of the transfer function. This form is the most convenient when the model will 
be used in frequency domain analyses, since it minimizes the computational cost of 
evaluating H(jw). 

For time domain analyses, such as transient simulations, expression (8.58) can be 
converted into the time domain with the inverse Laplace transform, which yields 


h(t) = Ro + Y Ra S [2R e” cos(p!!t) - 2R! e*t sin(p!'t)] (8.59) 
n=1 


n=n,+1 
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for t > 0, where p', = Re{p,,}, p} = Im{p,,}, R, = Re{R,} and R!! = Im{R,}. In (8.59), 
h(t) denotes the impulse response of the model. This form is particularly convenient 
in transient simulators based on convolutions like (8.1). While computing convolution 
integrals is in general very expensive, when an impulse response has the form (8.59), 
convolution can be computed very quickly using recursive formulas [35]. 

While convolutional simulators are prominent in selected applications, the ma- 
jority of transient simulators is based on the solution of differential equations, and 
cannot handle (8.59) directly. To overcome this issue, we can represent (8.58) through 
a set of differential equations in state-space form 


po = Ax(t) + Bu(t), (8.60) 


y(t) = Cx(t) + Du(t). 


System (8.60) is constructed in such a way that the transfer function between input 
u(t) and output y(t) is (8.58). Given a transfer function, there are infinitely-many sys- 
tems (8.60) that meet this criterion, known as realizations of H (s). We present a pop- 
ular realization, due to Gilbert [27], and refer the reader to [35] for a comprehensive 
description of how VF models can be realized. 

For reasons that will become clear later on, the Gilbert realization process begins 
with the truncated singular value decomposition [28] of residues R„ 


Ry =U EV? forn=1,...,ñ, + fig, (8.61) 


where 2, = diag{0,,1,---,Onp,} is a diagonal matrix collecting all nonzero singular 
values of R,,, and p, is the rank of R„. Matrices U,, € C?" and V, € C””?" are formed 
by the left and right singular vectors of R,,, respectively. Given (8.61), we can express 
the partial fractions in (8.58) associated to real poles as 


R I £ 
n = Upp- V; = Cy(SIy, - An) Ba» (8.62) 


S-P "S-P 
forn = 1,..., ñ,. In (8.62), I, is the identity matrix of size P, xX Pn, Cn = UnZn An = Ply,» 
and B, = vI . For complex poles, we can derive an equivalent expression for the sum 
of the two conjugate partial fractions [35] 


R, R Ri 
S- Pn s- Pi 


= Cy(Slyp, - An) By (8.63) 


forn =ñ, +1,...,ñ, + ñe, where 
p! I pI H 
A, = | Ai : Pa| B,=2 ev (8.64) 
-Phn I, Pry, Im{V, } 
Cn =[Re{U,Z,} Im{U,=,}] . (8.65) 
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Expressions (8.62) and (8.63) allow us to rewrite (8.58) as 


H(s) = D+ C(sIy - A) 'B, (8.66) 
where 
Ay B; 
A= s a Bal nt h (8.67) 
An, +i, Bisa, 
C=[C, ... Caan], D=Ro. (8.68) 


Since (8.66) is the transfer function of (8.60), equations (8.67) and (8.68) provide the 
coefficient matrices of a state space realization (8.60) of the transfer function (8.58) 
produced by VF. The order of (8.66) is 


n, 
N= > Pn t2 Pn (8.69) 
n=1 n=n,+1 

and can be shown to be minimal [27]. This property stems from the singular value 
decompositions (8.61), which reveal the rank p,, of each residue R,,. If those singular 
value decompositions are not performed, a realization of order nm is obtained. This 
realization may not be minimal, and may contain states that are not controllable, not 
observable, or both, as discussed in Chapter 2 of this volume. 

In addition to the forms presented in this section, the VF model (8.58) can be con- 
verted to a variety of additional forms, including equivalent electric circuits [5, 35] for 
seamless integration into any circuit simulator. 


8.3.10 Stability, causality and passivity enforcement 


Most systems of practical interest are stable, and the real part of their poles is either 
negative or zero. One would expect that, given noise-free samples of their frequency 
response, VF will produce a model with stable poles satisfying 


Re{p,} <0 vn. (8.70) 


Unfortunately, this is not guaranteed, since round-off errors may indeed push a few 
poles into the right half of the complex plane, making the VF model unstable. 

Condition (8.70) is essentially mandatory for time domain simulations, since oth- 
erwise results will diverge. The standard practice is to enforce stability during VF it- 
erations. After computing the new poles estimate p® with (8.18), the following rule is 
applied: 


f © if Re{p®} < 0, 
p? = f : a! (8.71) 


-Re{p®} + Im{p®} if Re{p®} > 0, 
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forn = 1,...,/, + ïe. We can see that, if a pole p® is unstable, the sign of its real part 
is inverted. Since in the tentative final fitting in step 8 of Algorithm 8.1 poles are fixed, 
condition (8.71) ensures the stability of the final model. 

For frequency domain analyses, one may think that (8.70) is not necessary, since 
stability is not an issue. However, one can show that (8.70), in the frequency domain, 
becomes a condition for causality [82]. Causality means that the system will react toan 
excitation only after it has been applied, and not before. In other words, if the system 
input u(t) begins at t = tọ (u(t) = O fort < tọ), the system output will start varying only 
at or after t = tọ. All systems in nature are obviously causal, since they cannot “antic- 
ipate” the application of an excitation. Enforcing (8.70) ensures that VF model (8.58) 
is causal. If this is not the case, frequency domain analyses will succeed, but results 
may be inaccurate and unphysical. In particular, the VF model may underestimate 
the delay between input and output which is present in the real system, which may 
be important in some applications, such as the timing analysis of digital circuits. A 
complete discussion of causality is beyond the scope of this chapter, and the reader is 
referred to [82]. 

Overall, condition (8.70) simultaneously enforces the stability and causality of the 
VF model. This condition can be enforced without any accuracy penalty when the 
given samples H, are error free, and thus faithfully represent the response of a causal 
and stable system. When samples are corrupted by noise or measurement errors, VF 
may be unable to reduce fitting error (8.6) to the desired level if condition (8.70) is en- 
forced. This happens when the noise or errors in samples H, are not causal functions 
themselves, and thus cannot be approximated with stable and causal poles [82]. Nu- 
merical algorithms exist to verify if the given samples H, satisfy the causality condition 
required by VF to fit them with high accuracy [77, 78, 51, 76, 7]. 

In addition to causality and stability, passivity is another important property that 
one may want to impose on the VF model (8.58). This property characterizes those 
physical systems that are unable to generate energy on their own, simply due to the 
lack of energy sources or gain mechanisms inside them. A circuit made by positive 
resistors, capacitors and inductors is an example of a passive system, in contrast to 
an amplifying circuit. When applied to the response of a passive system, VF may still 
produce a non-passive model, due to approximation and numerical errors. However, 
passivity can be enforced a-posteriori, with the methods presented in Chapter 5 of this 
volume. 


8.3.11 Numerical implementation 


Vector Fitting is easy to implement, and several free codes are available [75, 38]. This 
section briefly describes a few changes to the basic templates in Algorithms 8.1 and 8.2 
that can lead to a more robust and efficient implementation. 
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8.3.11.1 Order estimation 


The VF templates in Algorithms 8.1 and 8.2 require the desired model order ñ as input. 
Typically, this is not known a priori, but can be determined during the fitting pro- 
cess using the VF algorithm with adding and skimming [33], as shown in the example 
of Section 8.3.7. In this method, an initial estimate of ñ is derived from the phase of 
the given samples H;, and used in the first VF iteration. Then ñ is automatically in- 
creased or decreased based on the achieved error, as visible in Figure 8.8. If error e is 
still too high, the order is increased until either VF converges or it becomes evident 
that no further error reduction can be achieved, as in the last four iterations in Fig- 
ure 8.8. Conversely, when the algorithm detects that some partial fractions in (8.58) 
give a negligible contribution over the frequency range of interest, the order ñ is re- 
duced at the next iteration by removing such terms. This happens in the 13th iteration 
of the example in Section 8.3.7, where the order is reduced from 226 to 200. 


8.3.11.2 Relaxed VF: a better normalization of the weighting function 


In the original VF algorithm, the coefficients of weighting function (8.15) are normal- 
ized such that w (jw) > 1 when w — oo. It can be shown that this normalization is 
not optimal, and can slow down VF convergence when samples H, are contaminated 
by noise. The relaxed VF algorithm [41] mitigates this issue by redefining the weighting 
function as 
i it wi 
w® (s) = BE — a (8.72) 
—Pn 
where wl is now free to depart from one. With this change, the fitting equation (8.37) 
becomes 


(i) : Rom oy Wh 
RY — H, (ws + | =0. (8.73) 
0 kqm\ "Yo Bs 
oe a J — pf” Tee 3 
Since (8.73) admits a trivial solution (R® qm = = wh) = 0 Vn), the relaxed VF algorithm 


adds an additional constraint to exclude it [41] 


© 
Rew hey a "taf (8.74) 


n=1 JWk — Pn 


This constraint can be seen as a more relaxed normalization of the weighting function. 
Equations (8.73) and (8.74) are then jointly solved in the least-squares sense. In the 
single-input single-output case (q = m = 1), the system to be solved takes the form 


DP -DaF | fe] _ fo (8.75) 
o apo] [ce] L l 
k` k 0 w 
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where 
; - >T 
c® = [we wos w®] ; (8.76) 


In (8.75), B is a suitable weight to the last equation, which is typically set to [41] 


k 
B= È |H;l?. (8.77) 
k=1 


8.4 Generalized and advanced VF algorithms 


Since its inception in 1996, VF has inspired a generation of algorithms for the data- 
driven modeling of linear systems. These extensions either improve the original VF 
formulation, or extend it to different modeling scenarios. We briefly summarize the 
most relevant work in this area, and provide several bibliographic references where 
more details can be found. 


8.4.1 Time domain VF algorithms 


The original VF algorithm works in the frequency domain, and creates the reduced 
model from samples of the system frequency response. In some applications, how- 
ever, it is more convenient to characterize the system in the time domain. For exam- 
ple, one may have simultaneous measurements of the system input u(t;) and output 
y(t) at several time points t for l = 1,...,1, as in the example of Section 8.3.4. In this 
scenario, one has two options. The first is to estimate the systems’ frequency response 
from the time domain samples with the discrete Fourier transform, and apply VF in the 
frequency domain. However, the accuracy of the discrete Fourier transform depends 
significantly on the sampling rate of the given samples, and on their behavior near the 
boundaries t = t, andt = tj of the acquisition window. These issues, if not well un- 
derstood and managed, can result in an inaccurate time-frequency conversion, and 
degrade model quality. 

The second option is to use the time domain VF algorithm [30, 31], which directly 
extracts (8.58) from the time domain samples u(t,) and y(t,). This is achieved by rewrit- 
ing the fitting error (8.17) in the time domain, where multiplication by partial fraction 
1/(s —p,,) becomes a convolution between e”"‘ and the input or output samples. These 
convolutions can be computed by numerical integrations, leading to a time domain 
version of the original VF algorithm which closely follows the steps of the original 
frequency domain VF algorithm [35]. 

The time domain VF algorithm leads to a model in the continuous time domain. 
Alternatively, if the sampling period At = t,,, — tı is constant, one can also apply the z- 
domain VF [59], which relies on the z transform as opposed to the Laplace transform. 
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This latter algorithm leads to a model in the discrete time domain, which can be ex- 
pressed as a digital filter or as a set of difference equations (as opposed to differential 
equations). 


8.4.2 Improved Vector Fitting formulations 


In the QuadVF algorithm [23], a quadrature rule inspired by the H, error measure is 
used in conjunction with a suitable choice of frequency sampling points to improve 
the fidelity of the reduced model to the given samples. The same work also shows how 
one can incorporate derivative information, making QuadVF able to minimize a dis- 
crete Sobolev norm. In [24], this approach is extended to the multi-input multi-output 
case, and a way to control the McMillan degree’ of the approximation is proposed, 
which helps to achieve smaller reduced models when gq and m are high. 

The numerical robustness of VF, which is already quite remarkable in its original 
formulation, is further improved in the Orthonormal VF algorithm [21]. This algorithm 
replaces partial fractions 1/(s —p,,) in (8.15) and (8.16) with orthonormal rational func- 
tions, achieving better numerical conditioning of the linear system (8.38) to be solved. 

Another subject that received considerable attention is the robustness of VF 
against noise in the given samples H,. Noise may arise from the measurement pro- 
cess or, if samples were obtained with a numerical simulation, from round-off errors, 
approximations, and convergence issues. The relaxed normalization discussed in Sec- 
tion 8.3.11.2 improves VF convergence in the presence of noise [41]. Furthermore, the 
VF with adding and skimming includes a mechanism to detect spurious poles caused 
by noise [33]. Since spurious poles impair VF convergence, they must be removed 
throughout iterations [33]. This mechanism is coupled with a robust way to adaptively 
refine model order ñ to maximize accuracy even when noise is significant [33]. Taking 
into account noise variance in the definition of the VF fitting error was also shown to 
improve convergence [26]. Finally, instrumental variables can be used to unbias the 
VF process from the effects of noise, leading to better accuracy and convergence at no 
additional cost [10]. 


8.4.3 VF algorithms for distributed systems 


The efficient modeling of distributed systems is an open problem in model order reduc- 
tion. A system is distributed when the time a signal takes to propagate from an input 
to an output is not negligible. In systems described by a Helmholtz (wave) equation, 
this happens when the physical size of the system is not negligible compared to the 


1 The McMillan degree [93] of a matrix transfer function H(s) is the order of a minimal state space 
realization of H(s), such as the order N of the Gilbert realization discussed in Section 8.3.9. 
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wavelength. Propagation delays lead to the presence of irrational terms in the transfer 
function of the underlying system. Typically, these terms are in the form e *’ where t 
is the propagation delay. Rational functions, including the partial fractions in (8.58) 
can accurately fit these irrational terms, up to arbitrary accuracy. However, if t is not 
negligible, the required order may be large, and will quickly increase as T grows. This 
leads to a large model which may burden subsequent simulations. 

To overcome this issue, the core idea is to explicitly include exponential terms 
e *"! in the reduced model which will be fitted to the given samples. A popular choice 
is to define each element H, gm(S) of the model transfer function as 


l ui Ml —ST) 
Die) Ie (8.78) 


where the qm subscript was omitted from all coefficients for clarity. The exponential 
factors in (8.78) are meant to efficiently capture long propagation delays, while the 
rational terms between brackets will resolve the residual behavior of the system. Typ- 
ically, since long propagation delays are already accounted for by the exponential 
terms, the order ñ; of the rational factors can be kept quite low. 

For systems with uniform cross-section along the direction of propagation, such 
as electrical transmission lines and fluid pipes, VF is used in conjunction to the 
method of characteristics to obtain an efficient distributed model [50, 2, 36, 61]. For 
distributed systems of general shape, several VF algorithms with delay terms have 
been proposed [15, 16, 13, 79, 67, 58]. In these algorithms, the first step is to iden- 
tify the values of the relevant propagation delays Tt; present in the system. Given 
only frequency samples H,, this is not a trivial task, and the dominant approach is 
to exploit time-frequency decompositions [39, 32, 67, 48]. Next, the coefficients of 
the remaining rational factors in the model are determined with a VF-like iterative 
process [16, 13, 79, 67, 58]. 


8.4.4 Parametric VF algorithms 


The design process of an engineering system typically requires a large number of sim- 
ulations for different values of design parameters, such as material properties, geo- 
metrical dimensions and operating conditions (e. g. bias voltages, temperature, ...). 
In early design stages, parametric simulations are used to explore the design space. 
Later on, they may be used to optimize design in order to meet specifications or im- 
prove performance. Moreover, parametric simulations also help designers to account 
for manufacturing variability during design. In the context of parametric simulations, 
conventional VF models may be inefficient. Indeed, every time a parameter changes, 
a new set of samples H, must be obtained, and the fitting process has to be repeated 
from scratch. 
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A better solution is to create a parametric VF model which captures the system re- 
sponse with respect to both frequency s and some parameters of interest y®, y®,.... 
The core idea behind parametric VF techniques [83, 80, 64, 20, 34] is to let residues 
R, and poles p,, in (8.58) be parameter-dependent functions, such as polynomials in 
py, yp, .... Their coefficients can be determined with an iterative process analogous 
to the Sanathanan-Koerner iteration in Section 8.2, starting from samples of the sys- 
tem’s frequency response obtained for multiple values of parameters yp, w,.... The 
main advantage of a parametric model is that, once generated, it can be reused many 
times for different parameter values within its range of validity. One of the challenges 
in the generation of parametric VF models is how to guarantee that the model will be 
stable and passive over the desired parameter range [81, 85, 25]. Recently, systematic 
solutions to this challenging problem have been proposed [92]. 


8.5 Conclusion 


This chapter introduced the Vector Fitting algorithm, which has become one of the 
most popular tools for the extraction of linear reduced order models from samples of 
their response, collected in the frequency or in the time domain. Vector Fitting pro- 
duces a rational model which approximately minimizes the least-squares error be- 
tween the given samples and the model response. Determining model coefficients is 
originally a nonlinear least-squares problem, whose solution is prone to the typical 
issues of nonlinear minimization: high computational cost and problematic conver- 
gence due to local minima. Vector Fitting overcomes these issues by iteratively min- 
imizing a linearization of the original problem, leveraging well-established methods 
for the solution of linear least-squares problems. Several strategies to obtain a robust 
and efficient implementation of VF have been reviewed. When properly implemented, 
Vector Fitting enjoys remarkable robustness, efficiency and versatility, typically con- 
verging in a handful of iterations. Finally, we reviewed the most prominent extensions 
of the original algorithm which have been proposed for data-driven modeling of time 
domain systems, noisy samples, distributed systems, and parametric systems. 
Vector Fitting’s superior performance and reliability lead to a widespread use 
in many different fields. Originally conceived to predict how transients propagate 
throughout power distribution networks, VF is the method of choice for the wideband 
modeling of overhead lines, underground cables and power transformers [61, 40, 62, 
4, 35]. In electronic engineering, VF is extensively used to model the propagation of 
high-speed signals through interconnect networks found at the chip, package and 
printed circuit board level. These models are crucial for system design, and greatly 
help in preventing signal integrity, power integrity and electromagnetic compatibility 
issues [2, 68, 55, 74, 64, 1, 90]. The impact of VF in this area is confirmed by the fact that 
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all leading commercial tools for the design of high-frequency electronic circuits in- 
clude a VF module. Applications in microwave engineering [56, 84, 19, 18] and digital 
filter design [89] have also been reported. Within computational electromagnetism, 
VF can be used to efficiently model the Green function of layered media, which is 
necessary to solve Maxwell’s equations with integral equation methods [49, 11, 65]. 
The ability of VF to generate models compatible with transient simulations has also 
been exploited in the finite difference time domain (FDTD) method [57, 60], the finite 
element time domain method [12, 87], and the discontinuous Galerkin method [91]. 
Beyond electrical engineering, VF found countless applications in various domains, 
including acoustics [17, 66], fluid dynamics [3, 45, 35], mechanical engineering [35, 6], 
and in the thermal modeling of chemical batteries [44]. For a collection of VF appli- 
cations and additional references, the reader is referred to [35]. 
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9 Kernel methods for surrogate modeling 


Abstract: This chapter deals with kernel methods as a special class of techniques for 
surrogate modeling. Kernel methods have proven to be efficient in machine learn- 
ing, pattern recognition and signal analysis due to their flexibility, excellent experi- 
mental performance and elegant functional analytic background. These data-based 
techniques provide so called kernel-expansions, i.e., linear combinations of kernel 
functions which are generated from given input-output point samples that may be 
arbitrarily scattered. In particular, these techniques are meshless, do not require or 
depend on a grid, hence are less prone to the curse of dimensionality, even for high- 
dimensional problems. 

In contrast to projection-based model reduction, we do not necessarily assume a 
high-dimensional model, but a general function that models input-output behavior 
within some simulation context. This could be some micro-model in a multiscale sim- 
ulation, some submodel in a coupled system, some initialization function for solvers, 
coefficient function in Partial Differential Equations (PDEs), etc. 

First, kernel surrogates can be useful if the input-output function is expensive 
to evaluate, e. g. as a result of a finite element simulation. Here, acceleration can be 
obtained by sparse kernel expansions. Second, if a function is available only via mea- 
surements or a few function evaluation samples, kernel approximation techniques 
can provide function surrogates that allow for global evaluation. 

We present some important kernel approximation techniques, which are kernel 
interpolation, greedy kernel approximation and support vector regression. Pseudo- 
code is provided for ease of reproducibility. In order to illustrate the main features, 
commonalities and differences, we compare these techniques on a real-world appli- 
cation. The experiments clearly indicate the enormous acceleration potential. 
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9.1 Introduction 


This chapter deals with kernel methods as tools to construct surrogate models of ar- 
bitrary functions, given a finite set of arbitrary samples. 

These methods generate approximants based solely on input-output pairs of 
the unknown function, without geometrical constraints on the sample locations. In 
particular, the surrogates do not necessarily depend on the knowledge of an high- 
dimensional model but only on its observed input-output behavior at the sample 
sites, and they can be applied on arbitrarily scattered points in high dimension. 

These features are particularly useful when these methods are applied within 
some simulation context. For example, kernel surrogates can be useful if the input- 
output function is expensive to evaluate, e. g. is a result of a finite element simulation. 
Here, acceleration can be obtained by sparse kernel expansions. Moreover, if a func- 
tion is available only via measurements or a few function evaluation samples, kernel 
approximation techniques can provide function surrogates that allow global evalua- 
tion. 

Kernel methods are used with much success in Model Order Reduction, and far 
beyond the scope of this chapter. For example, they have been used in the modeling 
of geometry transformations and mesh coupling [3, 12, 13], and in mesh repair meth- 
ods [33], or in the approximation of stability factors and error indicators [14, 32, 34], 
where only a few samples of the exact indicators are sufficient to construct an effi- 
cient surrogate to be used in the online phase. Moreover, kernel methods have been 
combined with projection-based MOR methods, e. g. to obtain simulation-based clas- 
sification [60], or to derive multi-fidelity Monte Carlo approximations [40]. Kernel sur- 
rogates have been employed in optimal control problems [51, 59], in the coupling of 
multi-scale simulations in biomechanics [25, 69], in real time prediction for parame- 
ter identification and state estimation in biomechanical systems [29], in gas transport 
problems [22], in the reconstruction of potential energy surfaces [30], in the forecast- 
ing of time stepping methods [6], in the reduction of nonlinear dynamical systems [67], 
in uncertainty quantification [28], and for nonlinear balanced truncation of dynami- 
cal systems [5]. 

In further generality, there exist many kernel-based algorithms and application 
fields that we do not address here. Mainly, we address the solution of PDEs, in which 
several approaches have emerged in the last years, and which particularly allow one 
to solve problems with unstructured grids on general geometries, including high di- 
mensional manifolds (see e. g. [11, 17]). Moreover, several other techniques are studied 
within Machine Learning, such as classification, density estimation, novelty detection 
or feature extraction (see e. g. [53, 54]). 

Furthermore, we remark that these methods are members of the larger class of 
machine learning and approximation techniques, which are generally suitable to con- 
struct models based on samples to make prediction on new inputs. These models are 
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usually referred to as surrogates when they are then used as replacements of the model 
that generated the data, as they are able to provide an accurate and faster response. 
Some examples of these techniques are classical approximation methods such as poly- 
nomial interpolation, which are used in this context especially in combination with 
sparse grids to deal with high-dimensional problems (see [19]), and (deep) neural net- 
work models. The latter in particular have seen a huge increase in analysis and ap- 
plication in the recent years. For a recent treatment of deep learning, we refer e. g. to 
[21]. 

Despite these very diverse applications and methodologies, kernel methods can 
be analyzed to some extent in the common framework of Reproducing Kernel Hilbert 
spaces and, although the focus of this chapter will be on the construction of sparse 
surrogate models, parts of the following discussion can be the starting point for the 
analysis of other techniques. 

In general terms, kernel methods can be viewed as nonlinear versions of linear 
algorithms. As an example, assume to have some set X, := {Xk}k-1 C RÊ of data points 
and target data values Y„ := {y,},_, € R. We can construct a surrogate s : RÊ > R 
that predicts new data via linear regression, i. e., find w € RÊ s. t. s(x) := (w, x), where 
(.,-) is the scalar product in RÊ. A good surrogate model s will give predictions such 
that |s(x;,) - y;| is small. If we can write w € Ras w = Dja a;x; for a set of coefficients 
(aj), € IR", then s can be rewritten as 


n 
s(x) := $ ajx). 
j=l 
Note that this formulation includes also regression with an offset (or bias) b + 0, which 
can be written in this form by an extended representation as 
s(x) := (w,X) + b =: (w,X), 
where x := (x, 1)7 € R®! and w := (w, b)! e R®!. 


Using now the Gramian matrix A € R™" with entries Aj := (Xi Xj) and rows A} € 
R”, we look for the surrogate s which minimizes 


n 2 re 2 
(sœ) -y= VAP a= yi) = lAa - yl. 
Additionally, a regularization term can be added to keep the norm of a small, e. g. in 


terms of the value a7 Aa. Thus, the surrogate can be characterized as the solution of 
the optimization problem 


min ||Aa - yl + Àa? Aa, 
aeR" 


ie.,a = (A +AI)ty if À > 0. 
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In many cases this (regularized) linear regression is not sufficient to obtain a good 
surrogate. A possible idea is to try to combine this linear, simple method with a non- 
linear function which maps the data to a higher dimensional space, where the hope is 
that the image of the data can be processed linearly. For this we consider a so-called 
feature map ® : RÊ — H, where H is a Hilbert space, and apply the same algorithm 
to the transformed data ®(X,,) := {®(x;)}L, with the same values Y,,. Since the algo- 
rithm depends on X, only via the Gramian A, it is sufficient to replace it with the new 
Gramian Aj := (D(x), (x;)) y to obtain a nonlinear algorithm. 

We will see that (B(x), B(y)) 7 defines in fact a positive definite kernel, and if any 
numerical procedure can be written in terms of inner products of the inputs, it can 
be transformed in the same way into a new nonlinear algorithm simply by replacing 
the inner products with kernel evaluations (the so-called kernel trick). We will discuss 
the details of this procedure in the next sections in the case of interpolation and Sup- 
port Vector Regression, but this immediately gives a glance of the ample spectrum of 
algorithms in the class of kernel methods. 

This chapter is organized as follows. Section 9.2 covers the basic notions on ker- 
nels and kernel-based spaces which are necessary for the development and under- 
standing of the algorithms. The next Section 9.3 presents the general ideas and tools to 
construct kernel surrogates as characterized by the Representer Theorem, and these 
ideas are specialized to the case of kernel interpolation in Section 9.4 and Support 
Vector Regression in Section 9.5. In both cases, we provide the theoretical founda- 
tions as well as the algorithmic description of the methods, with particular attention 
to techniques to enforce sparsity in the model. These surrogates can be used to per- 
form various analyses of the full model, and we give some examples in Section 9.6. 
Section 9.7 presents a general strategy to choose the various parameters defining the 
model, whose tuning can be critical for a successful application of the algorithms. Fi- 
nally, we discuss in Section 9.8 the numerical results of the methods on a real applica- 
tion dataset, comparing training time (offline), prediction time (online), and accuracy. 


9.2 Background on kernels 


We start by introducing some general facts of positive definite kernels. Further de- 
tails on the general analytical theory of reproducing kernels can be found e. g. in the 
recent monograph [45], while the books [15, 65] and [53, 55] contain a treatment of ker- 
nel theory from the point of view of pattern analysis and scattered data interpolation, 
respectively. 


9.2.1 Positive definite kernels 


Given a nonempty set Q, which can be a subset of RÎ, d € N, butalsoaset of structured 
objects such as strings or graphs, a real- and scalar-valued kernel K on Q is a bivariate 
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symmetric function K : Qx Q —> R, i.e., K(x,y) = K(y,x) for all x,y € Q. For our 
purposes, we are interested in (strictly) positive definite kernels, defined as follows. 


Definition 9.1 (Positive definite kernels). Let Q be a nonempty set. A kernel K on Q is 
positive definite (PD) on Q if foralln € Nand for any set of n pairwise distinct elements 
Xn := Xi} C Q, the kernel matrix (or Gramian matrix) A := Ax € R™” defined as 
Aj = K(x;,x;), 1 < i,j < n, is positive semidefinite, i. e., for all vectors a := (ai); € R” 
we have 


n 
a’ Aa = $ aiaj K (xi, X;) >0. (9.1) 
ija 


The kernel is strictly positive definite (SPD) if the kernel matrix is positive definite, 
i. e., (9.1) holds with strict inequality when a # 0. 


The further class of conditionally (strictly) positive definite kernels is also of in- 
terest in certain contexts. We refer to [65, Chapter 8] for their extensive treatment, and 
we just mention that they are defined as above, except that the condition (9.1) has to 
be satisfied only for the subset of coefficients a which match a certain orthogonality 
condition. When this condition is defined with respect to a space of polynomials of 
degree m € N, the resulting kernels are used e. g. to guarantee a certain polynomial 
exactness of the given approximation scheme, and they are often employed in certain 
methods for the solution of PDEs. 


9.2.2 Examples and construction of kernels 


Despite the abstract definition, there are several ways to construct functions K : Q x 
Q — R which are (strictly) positive definite kernels, and usually the proper choice of 
the kernel is a crucial step in the successful application of the method. We list here a 
general strategy to construct kernels, and some notable examples. 

An often used, constructive approach to designing a new kernel is via feature 
maps as follows. 


Proposition 9.1 (Kernels via feature maps). Let Q be a nonempty set. A feature map © 
is any function ® : Q — H, where (H, (-,-)7) is any Hilbert space (the feature space). 
The function 


K(x, y) := (Dx), Oy), xy EQ, 


is a PD kernel on Q. 


Proof. K is a PD kernel since it is symmetric and positive definite, because the inner 
product is bilinear, symmetric and positive definite. 
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In many cases, H is either R™ with very large m or even an infinite dimensional Hilbert 
space. The computation of the possibly expensive m- or infinite-dimensional inner 
product can be avoided if a closed form for K can be obtained. This implies a signifi- 
cant reduction of the computational time required to evaluate the kernel and thus to 
execute any kind of algorithm. 

We see now some examples. 


Example 9.1 (Expansion kernels). The construction comprises finite dimensional lin- 
ear combinations, i. e., for a set of functions fva : Q > R, the function K(x, y) := 
Vier Ve0OV;(y) is a positive definite kernel, having a feature map 


(x) := (V(X), Vo(X)y 0-3 Vin(X)) € H := R”. (9.2) 


This idea can be extended to an infinite number of functions provided too € 
H := 8 (N) uniformly in Q, and the resulting kernels are called Hilbert-Schmidt or 
expansion kernels, which can be proven to be even SPD under additional conditions 
(see [49]). As an example in d = 1, we mention the Brownian Bridge kernel K(x, y) := 
max(x, y) — xy, defined with a feature map v,(x) := V2(jm) | sin(jnx) for j € IN, which 
is SPD on Q := (0,1). We remark that the kernel can be extended to (0, 1)" with d > 1 
using a tensor product of one-dimensional kernels. 

This feature map representation proves also that dim(H) =: m < oo means that the 
kernel is not SPD in general: e. g., if X,, contains n pairwise distinct points and m < n, 
then the vectors {®(x;)}-; cannot be linearly independent, and thus the kernel matrix 


is singular. 


Example 9.2 (Kernels for structured data). Feature maps are also employed to con- 
struct positive definite kernels on sets Q of structured data, such as sets of strings, 
graphs, or any other object. For example, the convolution kernels introduced in 
[20, 26] consider a finite set of features v,(X),...,Vm(x) € R of an object x € Q, and 
define a feature map exactly as in (9.2). 


Example 9.3 (Polynomial kernels). Fora > 0, p € N, x,y € RÍ, the polynomial kernel 
d P 
K(x, y) := (y) +a} = a + a) , x (x, 2x)’, (9.3) 
i=1 


is PD on any Q c R®. It is a d-variate polynomial of degree p, which contains the 
monomial terms of degrees j := (j")’,...,j) € J, for a certain set J c Na. Ifm := |J], a 
feature space is R” with feature map 


D(x) = (Vax, ..., Vam)", 


sps : j GH) 
for some positive numbers {a;};", and monomials x™ := TI yin, 
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Observe that using the closed form (9.3) of the kernel instead of the feature map is 
very convenient, since we work with d-dimensional instead of m-dimensional vectors, 
where possibly m := |J| = (SF) dim(P,,(R“)) >d. 


Example 9.4 (RBF kernels). For Q c RÊ in many applications the most used kernels 
are translational invariant kernels, i. e., there exists a function @ : R? > R with 


K(x y) := @(x - y),x,y € Q, 


and in particular radial kernels, i. e., there exists a univariate function @ : R>ọ > R 
with 


K(x, y) := (lx - yl), x,y € Q. 


A radial kernel, or Radial Basis Function (RBF), is usually defined up to a shape pa- 
rameter y > 0 that controls the scale of the kernel via K(x, y) := @(y||x - yi). 

The main example of such kernels is the Gaussian K(x, y) := e Ix-yl? , which is 
in fact strictly positive definite. An explicit feature map has been computed in [56]: If 
Q c R? is nonempty, a feature map is the function ®,: QA L(RÎ) defined by 


d 
2y) 
®,(x) := ( exp(—2y’||x —-°), x eQ. 


TMA 


In this case it is even more evident how working with the closed form of K is much 
more efficient than working with a feature map and computing L,-inner products. 

RBF kernels offer a significant easiness of implementation in arbitrary space di- 
mension d. The evaluation of the kernel K(x), x € R, on a vector of n points can 
indeed by realized by first computing a distance vector D € R”, D; := |x — xil, and 
then applying the univariate function @ on D. A discussion and comparison of differ- 
ent algorithms (in Matlab) to efficiently compute a distance matrix can be found in [15, 
Chapter 4], and most scientific computing languages comprise a built-in implementa- 
tion (such as pdist2’ in Matlab and distance_matrix? in Scipy). 

Translational invariant and RBF kernels can be often analyzed in terms of their 
Fourier transforms, which provide proofs of their strict positive definiteness via the 
Bochner theorem (seee. g. [65, Chapter 6]), and connections to certain Sobolev spaces, 
as we will briefly see in Section 9.2.3. 

Among various RBF kernels, there are also compactly supported kernels, i.e., 
K(x, y) = 0 if |x — yl] > 1/y, which produce sparse kernel matrices if y is large enough. 
The most used ones are the Wendland kernels introduced in [63], which are even radial 
polynomial within their support. 


1 https://www.mathworks.com/help/stats/pdist2.html 
2 https://docs.scipy.org/doc/scipy/reference/generated/scipy.spatial.distance_matrix.html 
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There are, in addition, various operations to combine positive definite kernels and 
obtain new ones. For example, sums and products of positive definite kernels and 
multiplication by a positive constant a > O produce again positive definite kernels. 
Moreover, if K' is a positive definite kernel and K” is symmetric with K’ < K” (i.e., 
K := K" — K' is PD) then also K” is positive definite. Furthermore, if Q = 0! x Q” 
and K', K” are PD kernels on Q’, Q”, then K(x, y) := K'(x',y')K" (x", y”) and K(x, y) := 
K'(x', y") + K"(x", y") are also PD kernels on Q, i. e., kernels can be defined to respect 
tensor product structures of the input. 

Further details and examples can be found in [45, Chapters 1-2]. 


9.2.3 Kernels and Hilbert spaces 


Most of the analysis of kernel-based methods is possible through the connection with 
certain Hilbert spaces. We first give the following definition. 


Definition 9.2 (Reproducing Kernel Hilbert Space). Let Q be a nonempty set, H an 
Hilbert space of functions f : Q — R with inner product (-,-)z,. Then H is called a 
Reproducing Kernel Hilbert Space (RKHS) on Q if there exists a function K :QxQ > R 
(the reproducing kernel) such that 

1. K(x) € H forall x eQ, 

2. f,K(-X))a =f(x) for allx € Q, f € H (reproducing property). 


The reproducing property is equivalent to state that, for x € Q, the x-translate 
K(-,x) of the kernel is the Riesz representer of the evaluation functional 6, : H — R, 
ô (f) := f(x) forf € H, which is hence a continuous functional in H. Also the converse 
holds, and the following result gives an abstract criterion to check if a Hilbert space is 
a RKHS. 


Theorem 9.1. An Hilbert space of functions Q — R is a RKHS if and only if the point 
evaluation functionals are continuous in H for all x € Q, i. e., 6, € H', the dual space of 
H. Moreover, the reproducing kernel K of H is strictly positive definite if and only if the 
functionals {6, : x € Q} are linearly independent in H’. 


Proof. The first part is clear from the reproducing property, while strict positive defi- 
niteness can be checked by verifying that the quadratic form in Definition 9.1 cannot 
be zero for a + Oif {6, : x € Q} are linearly independent. 


We see two concrete examples. 


Example 9.5 (Finite dimensional spaces). Any finite dimensional Hilbert space H of 
functions on a nonempty set Q is a RKHS. If m := dim(H) and {v; Hat is an orthonormal 
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basis, then a reproducing kernel is given by 


m 
K(x, y) = $ vyo)  %sy € Q. 
j=l 
Indeed, the two properties of Definition 9.2 can be easily verified by direct computa- 
tion. 


Example 9.6 (The Sobolev space H} (0, 1)). The Sobolev space H,(0,1) with inner 
product (f, g) H `= P f'(y)g' (y) dy is a RKHS with the Brownian Bridge kernel 


K(x,y) := min(x,y)- xy, x,y € (0,1) 


as reproducing kernel (see e.g. [8]). Indeed, K(x) € H} (0, 1), and the reproducing 
property (2) follows by explicitly computing the inner product. 


The following result proves that reproducing kernels are in fact positive definite 
kernels in the sense of Definition 9.1. Moreover, the first two properties are useful to 
deal with the various type of approximants of Section 9.4 and Section 9.5, which will 
be exactly of this form. 


Proposition 9.2. Let H be a RKHS on Q with reproducing kernel K. Letn, n’ € N, a € R", 
a’ € R” , Xp» X}, C Q, and define the functions 


fœ) := $ a;K(x, Xi) g(x) := È aK, xj), xXEQ. 


i=1 j=1 
Then we have the following: 
1. f,g€H, 


2. F Dn = Xi Xj- iK xX). 
3. K is the unique reproducing kernel of H and it is a positive definite kernel. 


Proof. The first two properties follow from Definition 9.2, and in particular from H 
being a linear space and from the bilinearity of (-,-)4,. 

For Property (3), the fact that K is symmetric and positive definite, hence a PD 
kernel, follows from Property (1) of Definition 9.2, and from the symmetry and positive 
definiteness of the inner product. Moreover, the reproducing property implies that, if 
K,K' are two reproducing kernels of H, then for all x,y € Q we have 


K(x, y) > (KC y), K's) ay > K' (x,y). 


It is common in applications to follow instead the opposite path, i. e., to start with 
a given PD kernel, and try to see if an appropriate RKHS exists. This is in fact always 
the case, as proven by the following fundamental theorem from [2]. 
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Theorem 9.2 (RKHS from kernels — Moore—Aronszajn theorem). Let Q be a nonempty 
set and K : Qx Q — Ra positive definite kernel. Then there exists a unique RKHS 
H := Hx(Q) with reproducing kernel K. 


Proof. The theorem was first proven in [2], to which we refer for a detailed proof. The 
idea is to deduce that, by Property (1) of Proposition 3, a candidate RKHS H of K needs 
to contain the linear space 


Ho := span {K(-,x):x € Q} 


of finite linear combinations of kernel translates. Moreover, from Property (2) of Propo- 
sition 9.2, the inner product on this Ho needs to satisfy 


F:n = SS gales x) (9.4) 


i=1 j=1 


With this observation in mind, the idea of the construction of #H is to start by Hg, prove 
that (9.4) defines indeed an inner product on Ho, and that the completion of Ho w. r. t. 
this inner product is a RKHS having K as reproducing kernel. Uniqueness then follows 
from Property (3) of the same proposition. 


As it is common in the approximation literature, we will sometimes refer to this unique 
#H as the native space of the kernel K on Q. 


Remark 9.1 (Kernel feature map). Among other consequences, this construction al- 
lows one to prove that any PD kernel is generated by at least one feature map. Indeed, 
the function ® : Q > H, ®(x) := K(x), is clearly a feature map for K with feature 
space H, since the reproducing property implies that 


(D(x), DY) Yy = (KXK Y)) a, =K y) forallx,y € Q. 


Remark 9.2. For certain translational invariant kernels it is possible to prove that the 
associated native space is norm equivalent to a Sobolev spaces of the appropriate 
smoothness, which is related to the kernels’ smoothness (see [65, Chapter 10]). This is 
particularly interesting since the approximation properties of the different algorithms, 
including certain optimality that we will see in the next sections, are in fact optimal 
in these Sobolev spaces (with an equivalent norm). 


The various operations on positive definite kernels mentioned in Section 9.2.2 
have an analogous effect on the corresponding native spaces. For example, the scal- 
ing by a positive number a > O does not change the native space, but scales the 
inner product correspondingly, and, if K' < K” are positive definite kernels, then 
Hx (Q) c Hg (Q). We remark that the latter property has been used for example in [71] 
to prove inclusion relations for the native spaces of RBF kernels with different shape 
parameters. 
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9.2.4 Kernels for vector-valued functions 


So far we only dealt with scalar-valued kernels, which are suitable to treat scalar- 
valued functions. Nevertheless, it is clear that the interest in model reduction is typ- 
ically also on vector-valued or multi-output functions, which thus require a general- 
ization of the theory presented so far. This has been done in [35], and it is based on the 
following definition of matrix-valued kernels. 


Definition 9.3 (Matrix-valued PD kernels). Let Q bea nonempty set and q € N. A func- 
tion K : QxQ > R? is a matrix-valued kernel if it is symmetric, i. e., K(x, y) = K(y, x)" 
for all x,y € Q. It isa PD (resp., SPD) matrix-valued kernel if the kernel matrix A € 
R"?"4 is positive semidefinite (resp., positive definite) for all n € N and for all sets 
Xn C Q of pairwise distinct elements. 


This more general class of kernels is also associated to a uniquely defined native space 
of vector-valued functions, where the notion of RKHS is replaced by the following. 


Definition 9.4 (RKHS for matrix-valued kernels). Let Q be a nonempty set, q € N, H 

an Hilbert space of functions f : Q — R? with inner product (-, -),,. Then H is called a 

vector-valued RKHS on Q if there exists a function K : QxQ — R?" (the matrix-valued 

reproducing kernel) such that 

1. K(-,x)v €H forallx e QO,ve RI, 

2. Ff, K(x) =f (x)'v for all x € Q, v € RY, f € H (directional reproducing prop- 
erty). 


A particularly simple version of this construction can be realized by considering 
separable matrix-valued kernels (see e. g. [1]), i. e., kernels that are defined as K(x, y) := 
K(x, y)B, where K is a standard scalar-valued PD kernel, and B € R?“ is a positive 
semidefinite matrix. In the special case Q = I (the q x q identity matrix), in [70] it is 
shown that the native space of K is the tensor product of q copies of the native space 
of K, i.e., 


Hx(Q) = {ff :Q> R? :f € Hg(Q),1 <j <q} 
with 


q 
CDu = Dp Si) 45- 
jal 


This simplification will give convenient advantages when implementing some of the 
methods discussed in Section 9.4. 
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9.3 Data based surrogates 


We can now introduce in general terms the two surrogate modeling techniques that we 
will discuss, namely (regularized) kernel interpolation and Support Vector Regression 
(SVR). 

For both of them, the idea is to represent the expensive map to be reduced as a 
function f : Q — Rf that maps an input x € Q to an output y € R’. Here f is assumed 
to be only continuous, and the set Q can be arbitrary as long as a positive definite 
kernel K can be defined on it. Moreover, the function does not need to be known in 
any particular way except than through its evaluations on a finite set X, := {x,}71 C Q 
of pairwise distinct data points, resulting in data values Y, := {yx := f (x,)}x-1 C R4. 

The goal is to construct a function s € H such that s(x) is a good approximation of 
f (x) for all x € Q (and not only for x € X,,), while being significantly faster to evaluate. 
The process of computing s from the data (X,,, Yn) is often referred to as training of the 
surrogate s, and the set (X,, Y,,) is thus called training dataset. 

The computation of the particular surrogate is realized as the solution of an infi- 
nite dimensional optimization problem. In general terms, we define a loss function 


L:HxQ" x (IR2)" > Ryo U {+00}, 


which takes as input a candidate surrogate g € H and the values X,, € Q”, Y, € (R9)", 
and returns a measure of the data-accuracy of g. Then the surrogate s is defined as a 
minimizer, if it exists, of the cost function 


J(g) = L(g, Xp Yq) + Allgll3y, 


where the second part of J is a regularization term that penalizes solutions with large 
norm. The tradeoff between the data-accuracy term and the regularization term is con- 
trolled by the regularization parameter A > 0. 

For the sake of presentation, we restrict in the remaining of this section to the case 
of scalar-valued functions, i. e., q = 1. The general case follows by using matrix valued 
kernels as introduced in Section 9.2.4, and the corresponding definition of orthogonal 
projections. 

The following fundamental Representer Theorem characterizes exactly some so- 
lutions of this problem, and it proves that the surrogate will be a function 


s € V(X,) := span {K(-,x;), x; € Xn} 


i. e., a finite linear combination of kernel translates on the training points. A first ver- 
sion of this result was proven in [27], while we refer to [52] for amore general statement. 


Theorem 9.3 (Representer Theorem). Let Q be a nonempty set, K a PD kernel on Q, 
A > O a regularization parameter, and let (X,,Y,) be a training set. Assume that 
L(s, Xn» Yn) depends on s only via the values s(x;), X; € Xp. 
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Then, if the optimization problem 
argmin J(g) = L(g, Xp, Yn) + Alig (9.5) 
gcH 
has a solution, it has in particular a solution of the form 
n 
s(x) := $ ajK(x,%;), x €Q, (9.6) 
j=l 
for suitable coefficients a € R". 


Proof. We prove that for every g € H there exists s € V(X,,) such that J(s) < J(g). To 
see this, we decompose g € H as 


g=st+s, sevVik,),s € VX). 
In particular, since K(-, x;) € V(X,,), we have by the reproducing property of the kernel 
st) = (s+, Kx) =0, 1sisn, 


thus g(x;) = s(x) + s+(x;) = s(x;) for 1 < i < n, and it follows that L(g,X,,Y,) = 
L(s,X, Yn). Moreover, again by orthogonal projection we have HA 5 Isil3, + IIs*3,- 
Since À > 0, we obtain 


J(S) = L(S, Xw Yn) + Allsllg, = LC, Xp Yn) + Allstl3y 
= L(g, Xw Yn) + Allgll3, — Alls i = J(g) -Alst < J8). 


Thus, if g € His a solution then s € V(X,,) is also a solution. 


The existence of a solution will be guaranteed by choosing a convex cost function 
J, i.e., since the regularization term is always convex, by choosing a convex loss func- 
tion. Then the theorem states that solutions of the infinite dimensional optimization 
problem can be computed by solving a finite dimensional convex one. 

This is a great result, but observe that the evaluation of s(x), x € Q, requires the 
evaluation of the n-terms linear combination (9.6), where n is the size of the dataset. 
Assuming that the kernel can be evaluated in constant time, the complexity of this op- 
eration is O(n). Thus, to achieve the promised speedup in evaluating the surrogate in 
place of the function f, we will consider in the following methods that enforce sparsity 
in s, i. e., which compute approximate solution where most of the coefficients a; are 
zero. If the nonzero coefficients correspond to an index set Iy := {i,,...,iy} C {1,...,n}, 
the complexity is reduced to O(N). 

Taking into account this sparsity and denoting Xy := {x; € X,:i¢ Iy} anda := (a; : 
i € Iy), we can summarize in Algorithm 9.1 the online phase for any of the following 
algorithms, consisting in the evaluation of s on a set of points X,. c Q. Here and in the 
following, we denote by s(X) := (s(X,),..., S(Xm))T € R” the vector of evaluations of s 
on a set of points X := {x;}", € Q. 
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Algorithm 9.1: Kernel surrogate — online phase. 


1: Input: Xy € QON, a € RY, kernel K (and kernel parameters), test points X,. := 
{x7} e Qe 

2: Compute the kernel matrix A,,. € RN, (Ate)y = K (x;°,%,). 

3: Evaluate the surrogate s(X*°) = Aja. 

4: Output: evaluation of the surrogate s(X“) € R"e. 


Remark 9.3 (Normalization of the cost function). It is sometimes convenient to weight 
the loss term in the cost function (9.5) by a factor 1/n, which normalizes its value with 
respect to the number of data. We do not use this convention here, and we only remark 
that this is equivalent to the use of a regularization parameter A = nd’ for a given 
A> 0. 


9.4 Kernel interpolation 


The first method that we discuss is (regularized) kernel interpolation. In this case, we 
consider the square loss function 


L(S, Xw Yn) := È (sœ) - yi) 
i=l 


which measures the pointwise distance between the surrogate and the target data. We 
have then the following special case of the Representer Theorem. We denote by y € R” 
the vector of output data, assuming again for now that q = 1. 


Corollary 9.1 (Regularized interpolant). Let Q be a nonempty set, K a PD kernel on Q, 
A= 0a regularization parameter. For any training set (X„, Yn) there exists an approxi- 
mant of the form 


n 
s(x) = J aK x), x €Q, (9.7) 
j=l 
where the vector of coefficients a € R” is a solution of the linear system 


(A+Ala=y, (9.8) 


where A € R"", Aj; := K(x;,x;), is the kernel matrix on X„. Moreover, if K is SPD this is 
the unique solution of the minimization problem (9.5). 


Proof. The loss L is clearly convex, so there exists a solution of the optimization prob- 
lem, and by Theorem 9.3 we know that we can restrict to solutions in V(X,,). 
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We then consider functions s := Èj a,K(-, x;) for some unknown a € R”. Comput- 
ing the inner product as in Proposition 9.2, we obtain 


n n 
sx) = È aK x) = (Aa) Isli = È aajK (xj) = a" Aa. 
j=1 ij=1 


The functional J restricted to V(X,,) can be parametrized by a € R”, and thus it can be 
rewritten as J : R” — R with 


Ja) = |Aa— yl + Aa" Aa = (Aa - y)" (Aa - y) + Àa” Aa 
=a’ ATAa- 2at Aly + yly +Aa™ Aa, 


which is convex in a since A is positive semidefinite. Since A is symmetric, its gradient 
is 


VJ (a) = 2A Aa — 2A7y + 2AAa = 2A(Aa -y + Àa), 


i.e., VJ (a) = Oifand only if A(A+AI)a = Ay, which is satisfied by a such that (A+ADa = 
y. If K is SPD then both A and A + AJ are invertible, so this is the only solution. 


The extension to vector-valued functions, i.e. q > 1, is straightforward using the 
separable matrix-valued kernels with B = I of Section 9.2.4. Indeed, in this case the 
data values are vectors y; := f(x;) € IR’, and thus in the interpolant (9.7) also the coef- 
ficients are vectors aj € R. The linear system (9.8) has the same matrix, but instead 
a, y € R”™1 are defined as 


a := (Ap. 0n), y := Yp- Yn)". (9.9) 
We remark that in the following the values x;, y;, s(x), and a, have always to be under- 
stood as row vectors when q > 1. This notation is very convenient when representing 
the coefficients as the solution of a linear system. Furthermore, the representation of 
the dataset samples (x, y) is quite natural when dealing with tabular data, where each 
column represents a feature and each row a sample vector. 
For K SPD and pairwise distinct sample locations X„ we can also set A := 0 and 
obtain pure interpolation, i. e., the solution satisfies L(s, X,,, Y,,) = 0, or 


s(x) =y;, 1<i<n. 


Observe that this means that with this method we can exactly interpolate arbitrary 
continuous functions on arbitrary pairwise distinct scattered data in any dimension, 
as opposite to many other techniques which require complicated conditions on the in- 
terpolation points or a grid structure. Moreover, this approximation process has sev- 
eral optimality properties in H, which remind one of similar properties of spline inter- 
polation. 
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Proposition 9.3 (Optimality of kernel interpolation). Let K be SPD, f € H, and À = 0. 
Then s is the orthogonal projection of f in V(X,,), and in particular 
If - slu = smin. If- gln- 
Moreover, if S := {g € H : g(x) = f(x), 1 < i < n}, then 
Isla = min |Igllz, 
ges 


i. e., s is the minimal norm interpolant of f on X,. 


Proof. The proof is analogous to the proof of the Representer Theorem, using a de- 
composition f = g + g+, and proving that s = g. 


We will see in Section 9.7 a general technique to tune A using the data, which 
should return A = 0 (or very small) when this is the best option. Nevertheless, also for 
an SPD kernel there are at least two reasons to still consider regularized interpolation. 
First, the data can be affected by noise, and thus an exact pointwise recovery does not 
make much sense. Second, a positive parameter A > 0 improves the condition number 
of the linear system, and thus the stability of the solution. Indeed, the 2-condition 
number of A + AI is 


Amax(A+AD) _ Amax(A) +A 
Amin(A + Al) Amin (A) +A? 


K(A) := 


which is a strictly decreasing function of A, with x(0) = x(A) and lim,_,,, KA) = 1. 
Moreover (see [66]) this increased stability can be achieved by still controlling the 
pointwise accuracy. Namely, if f € H, we have 


lyi - sell < VAlfly 1<i<n. 


We can then summarize the offline phase for regularized kernel interpolation in 
Algorithm 9.2. 


Algorithm 9.2: Regularized Kernel interpolation — offline phase. 


1: Input: training set X, € Q”, Y, € (R9)", kernel K (and kernel parameters), regular- 
ization parameter A > 0. 

2: Compute the kernel matrix A € R™”, Aj := K; x;). 

3: Solve the linear system (A + ADa = y. 

4: Output: coefficients a € R™?, 
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Remark 9.4 (Flat limit). The matrix A can be seriously ill-conditioned for certain ker- 
nels, and this constitutes a problem at least in the case of pure interpolation. It can also 
be proven that kernels which guarantee a faster error convergence result in a worse 
conditioned matrix [48]. 

For RPF kernels, this happens especially for y — 0 (the so called flat limit), and it 
is usually not a good idea to directly solve the linear system. In the last years there has 
been very active research to compute s via different formulations, which rely on dif- 
ferent representations of the kernel. We mention here mainly the RBF-QR algorithm? 
[18, 31] and the Hilbert-Schmidt SVD‘ [16] . Both methods are limited so far to only 
some kernels, but they manage to achieve a great accuracy, which is usually impossi- 
ble to obtain with the direct solution of the linear system. 


Remark 9.5 (Error estimation). For SPD translational invariant kernels there is a very 
detailed error analysis of the interpolation process (A = 0), for which we refer to [65, 
Chapter 11]. We only mention that these error bounds assume that f € H, and are of 
the form 


If -Sllrcay S CHN lz 
where C > 0 is a constant independent of f, and h, is the fill distance of X, in Q, i. e., 


hy := hy o := sup min ||x — x;ll, 
XEQ XEXn 

which is the analogue of the mesh width for scattered data. Moreover, the order of 
convergence p > 0 is dependent on the smoothness of the kernel. In particular, these 
error bounds can be proven to be optimal when the native space of K is a Sobolev 
space. 

Moreover, these results have been recently extended to the case of regularized 
interpolation (A > 0) in [43, 66]. 


9.4.1 Kernel greedy approximation 


The surrogate constructed via Corollary 9.1 involves a linear combination of n terms, 
where nis the size of the dataset. In general, there is no reason to assume that the result 
has any sparsity, i. e., in general all the a; will be nonzero, and it is thus necessary to 
introduce some technique to enforce this sparsity. 

A very effective way to achieve this result is via greedy algorithms. The idea is to 
select a small subset Xy C X,, N < n, given by indices Iy c {1,...,n}, and to solve the 


3 http://www.it.uu.se/research/scientific_computing/software/rbf_qr 
4 http://math.iit.edu/~mccomic/gaussqr/ 
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corresponding restricted problem with the dataset (Xy, Yy) to compute a surrogate 


Sy(X) := $ a. K(X, Xy), (9.10) 
kely 


where the coefficient vectors are computed based on (9.8), and are in general different 
from the ones of the full surrogate. If we manage to select Iy in a proper way, we will 
obtain sy(x) =~ f(x) for all x € Q, while the evaluation of sy(x) is now only of order 
O(N). 

An optimal selection of Xy is a combinatorial problem and thus is very expensive 
and in practice computationally intractable. The idea of greedy algorithms is instead 
to perform this selection incrementally, i.e., adding at each iteration only the most 
promising new point, based on some error indicator. 

The general structure of the algorithm is described in Algorithm 9.3. For the mo- 
ment, we consider a generic selection rule n : X, x IN x Q” x (IR?)" — Ryo that selects 
points based on the value n(x, N, X» Yn). This is a compact notation to denote that the 
selection rule assigns a score to a point x € Q, and it is computed using various quan- 
tities that depend on the dataset (X,,, Y,,) and on the iteration number N, including in 
particular the surrogate computed at the previous iteration. The algorithm is termi- 
nated by means of a given tolerance T > 0. 


Algorithm 9.3: Kernel greedy approximation — offline phase. 


1: Input: training set X, € Q”, Y„ € (IR4)", kernel K (and kernel parameters), regular- 
ization parameter A > 0, selection rule ņ, tolerance T. 


: Set N := 0, Xo := Ø, V(X) := {0}, So := 0. 
: repeat 
SetN :=N+1 


Select xy := argmaxyex Xy; 00 N, Xp Yn). 

Define Xy := Xy-1 U {xy} and V (Xy) := span {K(-, x;), x; € Xy} 
Compute the surrogate sy with dataset (Xy, Yy) with (9.8). 

: until n(xy, N, Xp» Yn) <T 

: Output: surrogate sy (i.e. coefficients a € RY”). 


Remark 9.6. In the case that the maximizer of 7 the line 5 of Algorithm 9.3 is not 
unique, only one of the multiple points is selected and included in Xy. 


In line 7 of the algorithm, we need to compute the surrogate sy with dataset 
(Xy, Yy). This step can be highly simplified by reusing sy_, as much as possible, 
thus improving the efficiency of the algorithm. As a side effect, with this incremental 
procedure it is easy to update the surrogate if the accuracy has to be improved. 
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This can be achieved using the Newton basis, which is defined in analogy to the 
Newton basis for polynomial interpolation. It has been introduced in [37, 39] for K 
SPD, and extended to the case of K PD and A > 0 in [47], and we refer to these papers 
for the proof of the following result. 


Proposition 9.4 (Newton basis). Let Q be non empty, A = 0, K be PD on Q or SPD when 
A = 0. Let X, c Q be pairwise distinct, and let Iy c {1,...,n}. Let moreover K(x, y) := 
K(x, y) + Ady, for all x,y € Q, and denote its RKHS as H}. 

The Newton basis {vey is defined as the Gram-Schmidt orthonormalization of 
{Ky Xi) bier, INH, i.e. 


V(X) := 3 
IKa C xi lle, (Kalix) 


k-1 
P) = Kax) — È vy V, 
jal 


VEO) ROD) 


lla, 7 Pr 


v(x) := , 1<k<N. 


Moreover, for all1 < k < N, we have 


N 
VEO) = X BK X), 
jal 


and, if B € RN*, By = Bj, and V € RY, Vik := Vy(X;), then B, V are triangular, 
B=V~',and 


Ay + AI = Vv" 


is the Cholesky decomposition of the regularized kernel matrix Ay + AI € RY, Ajk := 
K (Xip Xi) with pivoting given by Iy. 


Observe that this basis is nested, i. e., we can incrementally add a new element 
without recomputing the previous ones. Even more, with this basis the surrogate can 
be computed as follows. 


Proposition 9.5 (Incremental regularized interpolation). Let Q be non empty, À > 0, K 
be PD on Q or SPD when À = 0. Let (Xy, Yy) be the subset of (Xn, Yn) corresponding to 
indices Iy, forall N < n. 

Let šo := 0, and, for N > 1, compute the following incremental function 


N iy T SNOG, ) 
Sy) = > CV; (X) = CyVy(X) + Sy10), Cy = Yin BN iy? 


9.11 
k Vu Xip) AN 
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Then, for all N, the regularized interpolant can be computed as 


N 
Sy(X) = > aK) where a := V Tc. 
j=l 
Remark 9.7. In the case A = O and K SPD, the function $y coincides with the inter- 
polant sy. We refer to [39, 47] for the details. 


We are now left to define the selection rules, represented by n, to select the new 
point at each iteration. 

For this, we first need to define the power function, which gives an upper bound 
on the interpolation error, and it can be defined using the Newton basis as 


N 
Py (x)? = Kax) — Y vj. (9.12) 
jel 


Its relevance is due to the fact that it provides an upper bound on the pointwise (reg- 
ularized) interpolation error, i.e., if x ¢ X,, and K is PD, or SPD when A = 0, we have 
for all f € H that 


IF x) -= sy] < Paola. (9.13) 


This function is well known and has been studied in the case of pure interpolation 
(see e. g. [65, Chapter 11]), for which the upper bound holds for all x € Q, and it can be 
easily extended to the case of regularized interpolation (see [47]). In both cases, it can 
be proven that P,(x) = 0 if and only if x € X,, and its maximum is strictly decreasing 
with N. 


Remark 9.8. This interpolation technique is strictly related to the kriging method and 
to Gaussian Process Regression (see e. g. [38, 42]). In this case the kernel represents 
the covariance kernel of the prior distribution, and the power function is the Kriging 
variance, or variance of the posterior distribution (see [50]). 


We can then define the following selection rules. We assume to have a dataset 
(Xw Yn), and to have already selected N points corresponding to indices Iy_;. We use 
the notation [1,n] := {1,...,n}, and we have 
-  P-greedy: iy := argmaX jeri ny\ty_, Pry—10%)5 
—  f-greedy: iy := argmaxjeiynj\iy_, Vi - SN-100)l; 


eee lVirSy-1 | 
-  f/P-greedy: iy := argmaxc Peo) 


Observe that all the selections are well defined, since Py_,(x;) + O for alli ¢ Iy if 
Xy are pairwise distinct, and they can be efficiently implemented by using the update 
rules (9.11) for sy and (9.12) for Py. Moreover, they are motivated by different ideas: The 


P-greedy selection tries to minimize the Power function, thus providing a uniform up- 
per bound on the error for any function f € #H via (9.13); the f- and f/P-greedy (which 
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reads “f-over-P-greedy”), on the other hand, use also the output data, and produce 
points which are suitable to approximate a single function and thus are expected to 
result in a better approximation. In the case of f-greedy this is done by including in the 
set of points the location where the current largest error is achieved, thus reducing the 
maximum error. The f/P-greedy selection, instead, reduces the error in the H-norm, 
and indeed it can be proven to be locally optimal, i. e., it guarantees the maximal pos- 
sible reduction of the error, in the H-norm, at each iteration. 

We can now describe the full computation of the greedy regularized interpolant in 
Algorithm 9.4. It realizes the computation of the sparse surrogate sy by selecting the 
points Xy via the index set Iy, and computing only the nonzero coefficients a. More- 
over, using the nested structure of the Newton basis and the incremental computation 
of Proposition 9.5, the algorithm needs only to compute the columns of the full kernel 
matrix corresponding to the index set Iy, and thus there is no need to compute nor 
store the full n x n matrix, i. e., the implementation is matrix-free. In addition, again 
using Proposition 9.5 most of the operations are done in-place, i. e., some vectors are 
used to store and update the values of the Power Function and of y. In the algorithm, 
we use a Matlab-like notation, i. e., A(Iy,:) denotes the submatrix of A consisting of 
rows Iy and of all the columns. Moreover, the notation v? denotes the pointwise squar- 
ing of the entries of the vector v. 


Algorithm 9.4: Kernel greedy approximation — offline phase. 


1: Input: training set X, € Q”, Y, € (IR“)", kernel K (and kernel parameters), regular- 
ization parameter A > 0, selection rule 7, tolerance T. 

2: Set N := 0, Ip := 0, V := [-] € R™®, p := diag(K,(X,,,X,)) € R” 

3: repeat 

4: SetN=N+1 

5 Select iy := argmaxXjetyn)\ry_, NXN, Xp Yn): 

6 Generate column v := K} (Xp Xip) 

7: Project v := v — VV (iy, JT 

8 Normalize v = v/VVv(iy) 

9 Compute cy := y(iy)/V(iy) 
10: Update the power function p := p - v? 
11: Update the residual y := y — cyv 

12: Update Iy := Iy_; U {iy} 
13: Add the column V = [V, vy] 
14: Update the inverse C7 = V(Iy,:)? 

15: Add the coefficient c = [cf, Cy] 4 
16: until n(xy, N, Xp Yn) < T 

17: Set a = Cc 
18: Output: a € RN”4, Iy. 
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The set of points Xy defined by Iy, and the coefficients a, can then be used in the 
online phase of Algorithm 9.1. 


Remark 9.9 (Vector-valued functions and implementation details). Algorithm 9.4 and 
the overall procedure are well defined for arbitrary q > 1. Indeed, using the separable 
matrix-valued kernel of Section 9.2.4, the Newton basis only depends on the scalar- 
valued kernel K, while the computation of the coefficients is valid by considering that 
now c, a are matrices instead of vectors. In particular, the computation of cy (line 14) 
and the update of y (line 11) has to be done via column-wise multiplications. 

Moreover, observe that in line 12 we employ a standard technique to update the in- 
verse of a lower triangular matrix, i. e., given Vy € RYN lower triangular with inverse 
Ves we define 


Vy 0 
V = 
N+1 | ve w | 


for v € RY, w € R, and compute Vy, by a simple row-update as 


-1 
maol fy i 
N+1 -yT Vg /w 1/w 


The present version of the algorithm for vector-valued functions has been intro- 
duced in [68] and named Vectorial Kernel Orthogonal Greedy Algorithm (VKOGA). We 
keep the same abbreviation also for the regularized version, which has been studied 
in [47]. 


Remark 9.10 (Convergence rates). When the greedy algorithm is run by selecting 
points over Q instead of Xy, there are also convergence rates for the resulting approx- 
imation processes. For pure interpolation (i. e., K SPD, A = 0) convergence of f -greedy 
has been proven in [36], of P-greedy in [46], and of f/P-greedy in [68], while in [47] 
the convergence rate of P-greedy has been extended to regularized interpolation. All 
the results make additional assumptions on the kernels, for which we refer to the 
cited literature. Nevertheless, we remark that the convergence rates for interpolation 
with P-greedy are quasi-optimal for translational invariant kernels, while the results 
for the other algorithms guarantee only a possibly significantly slower convergence 
rate. These results are believed to be significantly sub-optimal, since extensive ex- 
periments indicate that f- and f/P-greedy cases behave much better. This seems to 
suggest that there is space for a large improvement in the theoretical understanding 
of the methods. 


Remark 9.11 (Other techniques). There are other techniques that can be applied to re- 
duce the complexity of the evaluation of the surrogate s, which do not use greedy 
algorithms but instead different approaches. First, there is a domain decomposition 
technique, known as Partition of Unity Method, which partitions Q into subdomains, 
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solves the (regularized) interpolation problem restricted to each patch, and then com- 
bines the results by a weighted sum of the local interpolants to obtain a global ap- 
proximant. This method has the advantage that this offline phase can be completely 
parallelized. Moreover, when evaluating the surrogate only the few local interpolant 
having a support containing the test points have to be evaluated, thus requiring the 
evaluation of a few, small kernel expansions, thus providing a significant speedup. 
The efficiency of this technique relies on an efficient search procedure to determine 
the local patches including the given points, which is the only limitation in the appli- 
cation to high dimensional problems. Both theoretical results and efficient implemen- 
tations are available [7, 64]. 

Moreover, other sparsity-inducing techniques have been proposed, namely, the 
use of an @,-regularization term [10], and the method of the Least Absolute Shrinkage 
and Selection Operator (LASSO) [61]. 


9.5 Support vector regression 


The second method that we present is Support Vector Regression (SVR) [53], which is 
based on different premises, but it still fits in the general framework of Section 9.3. In 
this case, we consider the €-insensitive loss function 


L(s, X» Yn) := $ La(s), y:i) L,(s(x;), yi) := max(0, |s(x;) - yil - £), 
ii 


which is designed to linearly penalize functions s which have values outside of an 
-tube around the data, while no distinction is made between function values that are 
inside this tube. 

In this setting it is common to use the regularization parameter to scale the cost by 
a factor 1/A, and not the regularization term by a factor A. The two choices are clearly 
equivalent, but we adopt here this different normalization to facilitate the comparison 
with the existing literature, and because this offers additional insights in the structure 
of the surrogate. 

Since the problem is not quadratic (and not smooth), we first derive an equiva- 
lent formulation of the optimization problem (9.5). Assuming again that the output is 
scalar, i.e., q = 1, the idea is to introduce non-negative slack variables ¢*,é~ € R” 
which represent upper bounds on L via 


¿t > max(0,s(x;)-y;-€), 1<i<n, (9.14) 


& >max(0,y;—s(x;)-€), 1<i<n, 


and to minimize them in place of the original loss. With these new variables we can 
rewrite the optimization problem in the following equivalent way. 
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Definition 9.5 (SVR — primal form). Let Q be a nonempty set, K a PD kernel on Q, A > 0 
a regularization parameter. For a training set (X„, Y„) the SVR approximant (s, é*, ~) € 
H x R” is a solution of the quadratic optimization problem 


s.t. s(x) -y;-E€<&, 1si<n 
-s(x;) +y;-E<ğ&, 1sisn 
Eg, 20, 1<i<n, 


where 1, := (1,...,1)7 € R”. 


For this rewriting of the optimization problem, we can now specialize the Repre- 
senter Theorem as follows. 


Corollary 9.2 (SVR - alternative primal form). Let Q be a nonempty set, K a PD kernel 
on Q, À > 0 a regularization parameter. For any training set (X„, Y„) there exists an SVR 
approximant of the form 


s(x) = $ ajK(x, x) XEO, (9.16) 
= 


where (a, €*, 7) € R” is a solution of the quadratic optimization problem 
ne Sy ee T 
min <1 + +a Aa 9.17 
png tale eS") (9.17) 
s.t. (Aa);-yj;-e<é, 1<i<n 
-(Aa);+y;-e<é, 1<i<n 
fe >20, 1<i<n, 
with 1, := (,...,1)’ € R", and A € R™", Aj := K(x;,x;), the kernel matrix on Xy. 
Moreover, if K is SPD this is the unique solution of the minimization problem (9.5). 


Proof. The result is an immediate consequence of Proposition 9.5, where we use the 
form (9.16) for s and compute its squared norm via Proposition 9.2. 


The slack variables (9.14) have a nice geometric interpretation. Indeed, the opti- 
mization process clearly tries to reduce their value as much as possible, while respect- 
ing the constraints. We state a more precise result in the following proposition, and 
give a schematic illustration in Figure 9.1. 


Proposition 9.6 (Slack variables). Let a,é*,€ ¢ R” be a solution of (9.17), and let s be 
the corresponding surrogate (9.16). Then, for each indexi € {1,...,n}, the values &", &7 
represent the distance of s(x;) from the £-tube around y;, and in particular 
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(xi, s(z:)) 


Figure 9.1: Illustration of the role of the slack variables in (9.17). 


1. Ifs(x;) > y; +, thené* > Oandé = 0. 
2. If s(x;) < y; -€, then č} =O and & > 0. 
3. Ify;-€ < s(x) <y; + €, thené* =O andé =0. 


In particular, only one of &;* and ë; can be nonzero. 


Instead of solving the primal problem of Corollary 9.2, it is more common to derive 
and solve the following dual problem. Here again we denote by y € R” the vector of 
all scalar training target values. 


Proposition 9.7 (SVR - dual form). Let Q be a nonempty set, K a PD kernel on Q, A > O 
a regularization parameter. For any training set (X,,, Y,,) there exists a solution (a*,a~) € 
R?” of the problem 
MEY sy = z — 
min —(a -a*)'A(a@ - a") + e11 (at +a) +y (at-a) 
a*a ER” 4 


s.t. a", a €[0,1/A]", (9.18) 
which is unique if K is SPD. Moreover, a solution of (9.17) is given by 


n o — at 
s(x) := $ : 5 J K(x,x;), x €Q, (9.19) 
jal 


with g := max(0, s(x;) — y; — £), è; := max(0, y; — s(x;) - €). 


Proof. We give a sketch of the proof, although a formal derivation requires more care, 
and we refer to [53, Chapter 9] for the details. The idea is to first derive the Lagrangian 
L := L(Q,č*, ë ;a", a, u*, W ) for the primal problem (9.17) using non-negative La- 
grange multipliers a*, a”, u*, uw” € R” for the inequality constraints, and then derive 
the dual problem by imposing the Karush-Kuhn-Tucker (KKT) conditions (see e. g. 
Chapter 6 in [53]). 

The Lagrangian is defined as 


L= aH (f+ +E) +a? Aa + (ut) (-£*) + (uw) (-€7) (9.20) 


+(Aa-y—e1,-€*) a" + (y-Aa-e1, -£7)'a 
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T 1 ‘ 1 7 
=(a+a*-a) Aa + (52,-2" =p) E (itna) č 
-e11 (at +a) -y (a -0). 


Using the symmetry of A, the partial derivatives of £ with respect to the primal vari- 
ables can be computed as 


- 1 1 -L 
V,£ = 2Aa+ A(a* - a`), VeL = zna =p", Vg-£ = qha -H> (9.21) 


and setting these three equalities to zero we obtain equations for a, y*, u~, where in 
particular a = (a — a’) (which is the unique solution if A is invertible). Substituting 
these values in the Lagrangian we get 


The remaining conditions in (9.18) stem from the requirements that the Lagrange mul- 
tipliers are non-negative, and in particular 0 < yp = 1/A- a; TE ay < 1/A, and 
similarly for a; . 


This dual formulation is particularly convenient to explain that the SVR surro- 
gate has a built-in sparsity, i.e., the optimization process provides a solution where 
possibly many of the entries of a = HCE — a") are zero. This behavior is in strong con- 
trast with the case of interpolation of Section 9.4 where we needed to adopt special 
techniques to enforce this property. The points x; € X,, with a; + 0 are called support 
vectors, which gives the name to the method. 

In particular, as for the slack variables there is a clean geometric description of 
this sparsity pattern, this gives additional insights into the solution. To see this we 
remark that, in addition to the stationarity KKT conditions (9.21), an optimal solution 
satisfies also the complementarity KKT conditions 


a; (s(x;) -yj-E-8) = 0, a; (yi - S(x;) —E- & ) = 0, (9.22) 
&(/A-aj)=0, & (1/A-a;) =0. (9.23) 


We then have the following: 

1. Equation (9.22) states that a} + 0 only if s(x;)-y; --e-é;" = 0, and similarly for a; . 
Since &* > 0, this happens only when s(x;) - y; > £, i. e., only for points (x;, s(x;)) 
which are outside or on the boundary of the e-tube. 

2. In particular, if a; + 0 it follows that s(x;) - y; > £, and thus y; - s(x;)- €- & #0, 
and then necessarily a; = 0. Thus, at most one of a; and a; can be nonzero. 

3. Equation (9.23) implies that a;',a; = 1/A whenever é,",é; is nonzero, i. e., when- 
ever s(x;) is strictly outside of the e-tube. The corresponding x; are called bounded 
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support vectors, and the value of the corresponding coefficients is indeed kept 
bounded by the value of the regularization parameter. Reducing A, i. e., using less 
regularization, allows solutions with coefficients of larger magnitude. 


In summary, we can then expect that, if s is a good approximation of the data, it will 
be also a sparse approximation. 

We summarize the offline phase for SVR in Algorithm 9.5. We remark that in this 
case the extension to vector-valued functions is not as straightforward as for kernel in- 
terpolation, and it is thus common to train a separate SVR for each output component. 


Algorithm 9.5: SVR -— offline phase. 


1: Input: training set X, € Q”, Y, € R”, kernel K (and kernel parameters), regular- 
ization parameter A > 0, tube width € > 0. 

Compute the kernel matrix A € R", Aj := K(x;,X;). 

Solve the quadratic problem (9.18). 

Set Iy := {i : a; # O ora; #0}. 

Set a; := (a; — a} )/2 fori € Iy. 

Output: a € RY, Iy- 


Remark 9.12 (General Support Vector Machines). SVRis indeed one member of a vast 
collection of algorithms related to Support Vector Machines (SVMs). Standard SVMs 
solve classification problems, i.e., Y, c {0, 1}. The original algorithm has been intro- 
duced as a linear algorithm (or, in the present understanding, as limited to the linear 
kernel, i.e., the polynomial kernel with a = 0, p = 1), and it has later been extended 
via the kernel trick to its general kernel version in [4]. The SVR algorithms have instead 
been introduced in [53]. 

Moreover, the version presented here is usually called €-SVR. There exists also an- 
other non equivalent version called v-SVR, which adds another term in the cost func- 
tion multiplied by a factor v € [0,1]. This plays the role of giving an upper bound on 
the number of support vectors and on the fraction of training data which are outside 
of the e-tube (see Chapter 9 in [53]). 

We also remark that it is sometimes common to include in any SVM-based algo- 
rithm also an offset or bias term b € R, i. e., to obtain a surrogate s(x) = Èj a;K (Xx, xj) + 
b. This changes in an obvious way the primal problem (9.17), while the dual contains 
also the constraint Zia + a; ) = 0. However, we stick here to this formulation and 
refer to [57] for a discussion of statistical and numerical benefits of not using this offset 
term, at least in the case of SPD kernels. 


Remark 9.13 (Error estimation). Also for SVR there is a detailed error theory, usually 
formulated in the framework of statistical learning theory (see [62]). Results are ob- 
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tained by assuming that the dataset (X,,, Y,,) is drawn from a certain unknown prob- 
ability distribution, and then quantifying the approximation power of the surrogate. 
For a detailed treatment of this theory, we refer to [53, 55]. Moreover, recently also 
deterministic error bounds for translational invariant kernels have been proven in 
[43, 44]. 


9.5.1 Sequential minimal optimization 


Although the optimization problem (9.18) can in principle be solved with any quadratic 
optimization method, there exists a special algorithm, called Sequential Minimal Op- 
timization (SMO) that is designed for SVMs and that performs possibly much better. 

SMO is an iterative method which improves an initial feasible guess fora*,a™ € IR" 
until convergence, and the update is made such that the minimal possible number of 
entries of a are affected. In this way, very large problems can be efficiently solved. The 
original version of the algorithm has been introduced in [41] for SVM, and it has later 
been adapted to more general methods such as SVR, which we use here to illustrate 
the structure of its implementation. 

The ideais to find at each iteration £ € Na minimal set ofindices I’ c {1,...,n} and 
optimize only the variables with indices in I°. The procedure is then iterated until the 
optimum is reached. If the SVR includes an offset term, as explained in the previous 
section we have constraints 


a},a; €[0,1/A], 1<i<n, (9.24) 
n 
(a; +a; ) =0. 


i=1 


Given a feasible solution (af, a; ) at iteration £ € N, it is thus not possible to update 
a single entry of a; or a; without violating the KKT conditions (since at most one be- 
tween a; and a; need to be nonzero) or violating the second constraint. It is instead 
possible to select two indices I° := {i,j} and in this case we have variables a; 0; OF 07 


ded 
and we can solve the restricted quadratic optimization problem under the constraints 


aja; €[0,1/A],iel’, $ (af +a;)=R = - Y (af +0;), 


which can be solved analytically. 

The crucial step is to select I“, and this is done by finding a first index that does 
not satisfy the KKT conditions and a second one with some heuristic. It can be proven 
that, if at least one of the two violates the KKT conditions, then the objective is strictly 
decreased and convergence is obtained. Moreover, the vectors a* = a = 0 e€ R” 
are always feasible and can thus be used as a first guess. In practice, the iteration is 
stopped when a sufficiently small value of the cost function is reached. 
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In the case of SVR without offset discussed in the previous section the situation is 
even simpler, since the second constraint in (9.24) is not present and it is thus possible 
to update a single pair (a; ,a; ) at each iteration. Nevertheless, it has been proven in 
[57] that using also in this case two indices improves significantly the speed of con- 
vergence. Moreover, the same paper introduces several additional details to select the 
pair, to optimize the restricted cost function, and to establish termination conditions. 

A general version of SMO for SVR is summarized in Algorithm 9.6, where we as- 
sume that the function 7 : {1,...,n} — {1,...,n} implements the selection rule of I K 


Algorithm 9.6: SMO. 


1: Input: training set X, € Q”, Y, € R”, kernel K (and kernel parameters), regular- 
ization parameter À > 0, tube width € > 0, selection rule n, tolerance T. 

2: Set £ := O and (a*,a7)® := (0,0). 

3: while (a*,a7)® does not satisfy KKT conditions within tolerance t. do 

4 Set £ = £ +1. 

5 Set I° := {i,j} := n({1,...,n}). 

6: Set (af, a) := (at, ap)? fork ¢ T°. 

7: 

8 

9 


Solve the optimization problem restricted to I’. 
: end while 
: Set Iy := {i: a, #0 ora; +0}. 
10: Set a; := (a; - a} )/2 for i € Iy. 
11: Output: a € RY, Iy. 


Remark 9.14 (Reference implementations). We remark that there exist commonly 
used implementations of SVR (and other SVM-related algorithms), which are avail- 
able in several programming languages and implement also some version of this 
algorithm. We mention especially LIBSVM? [9] and liquidSVM° [58]. 


9.6 Model analysis using the surrogate 


Apart from predicting new inputs with good accuracy and a significant speedup, the 
surrogate model can be used to perform a variety of different tasks related to meta- 
modeling, such as uncertainty quantification and state estimation. This can be done 
in a non-intrusive way, meaning that the full model is employed as a black-box that 
provides input—output pairs to train the surrogate, but is not required to be modified. 


5 https://www.csie.ntu.edu.tw/~cjlin/libsvm/ 
6 https://github.com/liquidSVM/liquidSVM 
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In principle, any kind of analysis that requires multiple evaluations can be signif- 
icantly accelerated by the use of a surrogate, including the ones that are not compu- 
tationally feasible due to the high computational cost of the full model. An example 
is uncertainty quantification, where the expected value of f can be approximated by 
a Monte Carlo integration of s using a set Xm C Q of integration points, i. e., 


| Foo dx = = S s). 


fo} i=1 


Once the surrogate is computed using a training set (X,,, Y„), this approximate integral 
can be evaluated also for m > n with a possibly very small cost, since the evaluation 
of s is significantly cheaper than the one of f. 

Another example, which we describe in detail in the following, is the solution of 
an inverse problem to estimate the input parameter which generated a given output, 
i. e., from a given vector y € IR? we want to estimate x € Q such that f(x) = y. This can 
be done by considering a state-estimation cost function C : Q — R defined by 


1 


2: 
=> l (9.25) 
2171 ; 


C(x): 


Isc -y| 


and estimating x by the value x* defined as 
x" := min C(x). 
xeQ, 


In principle, we could perform the same optimization also using f instead of s in (9.25), 
but the surrogate allows a rapid evaluation of C. Moreover, if K is at least differen- 
tiable, then also s is differentiable, and thus we can use gradient-based methods to 
minimize C. 

To detail this approach, we assume f : Q — Rî and to have a surrogate ob- 
tained as in Section 9.4.1 with the separable matrix-valued kernel of Section 9.2.4, i.e., 
from (9.10) we have 


Sy(X) = oy a K(X, Xy). 
kely 
As explained in (9.9), in the vector-valued case q > 1 we always assume that the output 
sy(x) and the coefficients aj, are row vectors, and in particular a € R‘*? and sy(x) € 
R, In this case we have the following. 


Proposition 9.8 (Gradient of the state-estimation cost). Forx € Q c R and ye RY, 
the gradient of the cost (9.25) can be computed in x € Q as 
1 


— (Da)E", 
Iyi 


VC(x) = 


where D € IR?" with columns D; := V,K(x,x;), and E := Sy(x) -y € R, 
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Proof. By linearity, the gradient of sy in x can be computed as 


n 
Vsy(x) = È ajV,K(x,%;) = Da € R™, 
jal 


and thus 


V(X) = 5 (s0) -V)Vy( S00 -7) = 
HE 


ar 1 
Iyi 


(sy(x) - Y)Vsx) 


1 
Iyi 


(Da)ET. 


Observe in particular that whenever K is known in closed form the matrix D can be ex- 
plicitly computed, and thus the gradient can be assembled using only matrix-vector 
multiplications of matrices of dimensions N, d, q, but independent of n. The solution 
x* can then be computed by any gradient-based optimization method, and each iter- 
ation can be performed in an efficient way. 


9.7 Parameter and model selection 


For all the methods that we have seen the approximation quality of the surrogate de- 
pends on several parameters, which need to be carefully chosen to obtain good results. 
There are both parameters defining the kernel, such as the shape parameter y > 0 in 
a RBF kernel, and model parameters such as the regularization parameter A > 0. To 
some extent, also the selection of the kernel itself can be considered as a parametric 
dependence of the model. Moreover, it is essential to test the quality of the surrogate 
on an independent test set of data, since tuning it on the training set alone can very 
likely lead to overfitting, i.e., to obtain a model that is excessively accurate on the 
training set, while failing to generalize its prediction capabilities to unseen data. 

In practical applications the target function f is unknown, so it cannot be used 
to check if the approximation is good, and all we know is the training set (X,,, Yn). In 
this case the most common approach is to split the sets into train, validation and test 
sets in the following sense. We permute (X, Y,,), fix numbers nj, Nyali Nte Such that 
N = Ny + Nya + Nye, and define a partition of the dataset as 


Xu = {Xp 1 <i < ny}, 
Xyal = {Xi Ng +1 < Í< Ny + Nai}, 


Xte = {Xp My + Nya + 1 <i < nh, 


and similarly for Yy, Yyal» Yie- 
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The idea is then to use the validation set (Xa) Yyq) to validate (i. e., choose) the 
parameters, and the test set (Xte» Yte) to evaluate the error. Having disjoint sets allows 
one to have a fair way to test the algorithm. 

For the process we also need an error function that returns the error of the sur- 
rogate s evaluated on a generic set of points X := {x,;}; c Q w.r.t. the exact values 
Y := {y;};. We denote by |X| the number of elements of X. Common examples are the 
maximal error and the Root Mean Square Error (RMSE) defined as 


|x| 


Sls - yil: (9.26) 


i=1 


E(s, X, Y) := p ls&;) -yil2 or E(s,X,Y) = T 

Then one chooses a set of possible parameter instantiations {p,,..., Pn,}> np € N 
that has to be checked. A common choice for positive numerical parameters is to take 
them logarithmically equally spaced, since the correct scale is not known in advance, 
in general. 

The training and validation process is described in Algorithm 9.7, where we de- 
note by s(p;) the surrogate obtained with parameter p;. It works as an outer loop with 
respect to the training of any of the surrogates that we have considered, and it has thus 
to be understood as part of the offline phase. 


Algorithm 9.7: Model selection by validation. 
: Input: Xr, Xvab Xtes Yi Yvab Yter WPi- -- »Pn,} 
: fori=1,...,n,do 
Train surrogate s(p;) with data (Xy, Yy) 
Compute validation error e; := E(s(p;); Xyab Yva) 
end for 
: Choose parameter p := p; with i := argmin e; 
: Train surrogate s(p) with data (Xj, U Xyap Yir U Yvai) 
: Compute test error Ë = E(s(), Xie, Vie) 
: Output: surrogate s(p), optimal parameter p, test error E 


A more advanced way to realize the same idea is via k-fold cross validation. To have 
an even better selection of the parameters, one can repeat the validation step (lines 
2-6 in the previous algorithm) by changing the validation set at each step. To do so 
we do not select a validation set (so n = Ny + Nie), and instead consider a partition of 
Xty> Yy into a fixed number k € {1,..., ny} of disjoint subsets, all approximately of the 
same size, i.e., 


; k 
X= pl < Í < Ny} = Uj-1Xi 


Xe := {Xi Ng +1 <is nh, 


9 Kernel methods for surrogate modeling —— 343 


and similarly for Y, := UK, Y; and for Y,,. In the validation step each of the X; is used as 
a validation set, and the validation is repeated for alli = 1,...,k. In this case the error 
e; for the parameter p; is defined as the average error over all these permutations, as 
described in Algorithm 9.8. 


Algorithm 9.8: Model selection by k-fold cross validation. 


1: Input: Xy = UX) Xte Yer = Uka Yp Yeo Pr- Pr,} 
2: fori=1,...,n,do 

3 forj =1,...,k do 

4 Train surrogate s(p;) with data (Up4;X¢, UezjYe) 
5: Compute error e = E(s(p;), X; Y;) 

6 end for 

7 ej := mean{e®,1 <j < k} 

8: end for 

9: Choose parameter p := p; with i := argmin e; 


10: Train surrogate s(p) with data (Xy, Yy) 
11: Compute test error E = E(s(p), Xie Yie) 
12: Output: surrogate s(p), optimal parameter p, test error E 


We remark that, in the extreme case k = N, this k-fold cross validation is usually called 
Leave One Out Cross Validation (LOOCV). 


9.8 Numerical examples 


For the testing and illustration of the two methods of Section 9.4 and Section 9.5, we 
consider a real-world application dataset describing the biomechanical modeling of 
the human spine introduced and studied in [69]. We refer to that paper for further 
details and we just give a brief description in the following. 

The input-output function f : R? > R? represents the coupling between a global 
multibody system (MBS) and a Finite Elements (FEM) submodel. The human spine is 
represented as a MBS consisting of the vertebra, which are coupled by the interaction 
through intervertebral disks (IVDs). The PDE representing the behavior of each IVD 
is approximated by a FEM discretization, and it has the input geometry parameters 
as boundary conditions, and computes the output mechanical response as a result 
of the simulation. In particular, the three inputs are two spatial displacements and 
an angular inclination of a vertebra, and the three outputs are the corresponding two 
force components and the momentum which are transferred to the next vertebra. The 
dataset is generated by running the full model for n := 1370 different input parameters 
X, and generating the corresponding set of outputs Y,,. 
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The dataset, as described in Section 9.7, is first randomly permuted and then di- 
vided in training and test datasets (X, 5 Yn); (Xn Yn) With ny := 1238 and nije = 132, 
corresponding to roughly 90 % and 10% of the data. We remark that the full model 
predicts a value (0,0, 0)? for the input (0, 0, 0)! and this sample pair is present in the 
dataset. We thus manually include it in the training set independently of the permu- 
tation. The training and test sets can be seen in Figure 9.2. 


2000 


-2000 -2000 


2000 


-5000 -2000 


Figure 9.2: Input parameters (left) and corresponding outputs (right) for the training (top row) and 
test set (bottom row). 


The models are trained using a Matlab implementation of the algorithms. For VKOGA 
we use an own implementation,’ while for SVR we employ the KerMor package,® 
which provides an implementation of the 2-index SMO for the SVR without offset that 
is discussed in Section 9.5.1. We remark that this implementation requires the output 
data to be scaled in [-1,1], and thus we perform this scaling for the training and val- 
idation, while the testing is executed by scaling back the predictions to the original 
range. To have a fair comparison, we use the same data normalization also for the 
VKOGA models. 


7 https://gitlab.mathematik.uni-stuttgart.de/pub/ians-anm/vkoga 
8 https://www.morepas.org/software/kermor/index.html 
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The regularized VKOGA (with f-, P-, and f/P-greedy selection rules) and the SVR 
models are trained with the Gaussian kernel. Both algorithms depend on the shape 
parameter y of the kernel and on the regularization parameter A, while SVR addition- 
ally depends on the width € of the tube. These parameters are selected by k-fold cross 
validation as described in Section 9.7. The values of k and of the parameter samples 
used for validation are reported in Table 9.1, where each parameter set is obtained by 
generating logarithmically equally spaced samples in the given interval, i.e., 400 pa- 
rameter pairs are tested for VKOGA and 4000 triples for SVR. As an error measure we 
use the max error in (9.26). We remark that the SVR surrogate is obtained by training 
a separate model for each output, as described in Section 9.5, but only one cross val- 
idation is used. This means that for each parameter triple three models are trained, 
and then the parameter is evaluated in the prediction of the three-dimensional out- 
put. 


Table 9.1: Parameters ranges and sample numbers used in the k-fold cross validation. 


k Ymin Ymax ny Àmin Àmax nà Emin Emax ng 


5 107 10' 20 #10776 10 20 101° 10% 10 


Moreover, the training of the VKOGA surrogates is terminated when the square of the 
power function is below the tolerance Tp := 107’, or when the training error is below 
the tolerance Tp := 10°. Additionally, it would be possible to use a maximal number 
of selected points as stopping criterion, and this offers the significant advantage of 
directly controlling the expansion size, which could be reduced to any given number 
(of course at the price of a reduced accuracy). In the case of SVR, instead, the number 
N is a result of the tuning of the remaining parameters. 

In Table 9.2 we report the values of the parameters selected by the validation pro- 
cedure for the four models, as well as the number N of nonzero coefficients in the 
trained kernel expansions. Observe that for SVR the three values of N refer to the num- 
ber of support vectors for the three scalar-valued models. Moreover, the number of 
support vectors or kernel centers is only slightly larger for SVR than for the VKOGA 
models, but, as discussed in the following, the VKOGA models give prediction errors 
which are up to two orders of magnitude smaller than the ones of the SVR model. 

We can now test the four models in the prediction on the test set. Table 9.3 contains 
various error measures between the prediction of the surrogates and the exact data. 
We report the values of the maximum error E max and the RMSE Epysg defined in (9.26), 
and the relative maximum error E max,re] obtained by scaling each error by the norm of 
the exact output. 

To provide a better insight in the approximation quality of the methods, we show 
in Figure 9.3 the distribution of the error over the test set. The plots show, for each sam- 
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Table 9.2: Selected parameters and number of nonzero coefficients in the kernel expansions. 


Method N Y A E 


VKOGA P-greedy 1000 4.9 -107° 1071! - 
VKOGA f-greedy 879 4.3 -107° 1071? - 
VKOGA f/P-greedy 967 6.2-10°7 10°? = 
SVR, output 1 359 1.8.1071 10? 7.7-107 
output 2 378 
output 3 405 


Table 9.3: Test errors: maximum error Emax, RMSE error Egmse, maximum relative error Emax ret: 


Method Emax ERMSE Emax,rel 


VKOGA P-greedy 1.6 10? 22.3 2.2.1071 
VKOGA f-greedy 1.6 -10° 22.4 2.0- 107! 
VKOGA f /P-greedy 1.6 -10° 23.2 8.8. 107! 


SVR 1.3 -10° 1.6- 10? 1.4 -10t 


ly ll 


107 
10° 105 10° 10° 


ly Il, lly ill, 


Figure 9.3: Absolute errors as functions of the magnitude of the output, and relative error levels from 
10° to 107° for the surrogates obtained with P-greedy VKOGA (top left), f-greedy VKOGA (top right), 
f /P-greedy VKOGA (bottom left) and SVR (bottom right). 


9 Kernel methods for surrogate modeling —— 347 


ple (x;,y;) in the test set, the absolute error |ly; — s(x;)||, as a function of the magnitude 
lly;llə of the output. Moreover, the black lines represent a relative error from 10° to 107. 
It is clear that in all cases the maximum and RMS errors of Table 9.3 are dominated 
by the values obtained for outputs of large norm, where the VKOGA models obtain a 
much better accuracy than SVR. The relative errors, on the other hand, are not evenly 
distributed for SVR, where most of the test set is approximated with a relative error 
between 10! and 10”? except for the samples with small magnitude of the output. For 
these data, the model gives increasingly bad predictions as the magnitude is smaller, 
reaching a relative error much larger than 1. The VKOGA models, instead, obtain a rela- 
tive error smaller than 10“ on the full test set except for the entries of small magnitude. 
For these samples, the f- and P-greedy versions of the algorithm perform almost the 
same and better than the f/P-greedy variant, thus giving an overall smaller relative 
error in Table 9.3. Moreover, these results are obtained with a significantly smaller ex- 
pansion size for f-greedy than for P-greedy. Indeed, even if the SVR surrogates for the 
individual output components are smaller than the VKOGA ones, the overall number 
of nonzero coefficients is 359 + 378 + 405 = 1142, i. e., more than the one of each of the 
three VKOGA models, thus leading to a less accurate and more expensive surrogate. 

Regarding the runtime requirements, we can now estimate both the offline (train- 
ing) and the online (prediction) times. The offline time required for the validation and 
training of the models is essentially determined by the number of parameters tested 
in the k-fold cross validation, while the training time of a single model is almost negli- 
gible. As a comparison, we report in Table 9.4 the average runtime Tomine for 10 runs of 
the training of the models for the fixed set of parameters of Table 9.2. All the reported 
times are in the ranges of seconds (for VKOGA) and below one minute (for SVR). We 
remark that this timing is only a very rough indication and not a precise comparison, 
since the times highly depends on the number of selected points (for VKOGA) and the 
number of support vectors for SVR, and both are dependent on the used parameters. 
For example, we repeated the experiment for SVR with the same parameter set but 
with e = 1071. In this case this value of e is overly large (if compared to the one se- 
lected by cross validation) and it likely produces a useless model, but nevertheless we 
obtain an average training time of 0.03 sec. 


Table 9.4: Average offline time (training only), online time, and projected speedup factor for the four 
different models. 


Method N Toffline Tontine Truu/Tontine 
VKOGA P-greedy 1000 1.67 sec 9.97 - 10° sec 3.01- 10° 
VKOGA f-greedy 879 1.41 sec 9.44-10° © sec 3.18- 10° 
VKOGA f /P-greedy 967 1.66 sec 9.92-10°° sec 3.02- 10° 


SVR (3 models) 1142 52.0 sec 2.28- 107° sec 1.32-10° 
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A more interesting comparison is the online time, which directly determines the effi- 
ciency of the surrogate models in the replacement of the full simulation. In this case, 
we evaluate the models 5000 times on the full test set consisting of ne = 132 sam- 
ples, and we report the average online time T,ntine per single test sample in Table 9.4. 
The table contains also again the number N of elements of the corresponding kernel 
expansions, and it is evident that a smaller value leads to a faster evaluation of the 
model. 

In the original paper [69], it has been estimated that a 30 sec full simulation with 
24 IVDs with a timestep At = 107° sec requires 7.2 - 10° evaluations of the coupling 
function f, and these were estimated to require 600 h. This corresponds to an average 
of Try = 3 sec per evaluation off , giving a speedup Tiun / Tonine as reported in Table 9.4. 

These surrogates can now be employed to solve different tasks that require mul- 
tiple evaluations of f. As an example, we employ the f-greedy model (as the most ac- 
curate and most efficient) to solve a parameter estimation problem as described in 
Section 9.6. We consider the output values Y, in the test set as a set of measures that 
have not been used in the training of the model, and we try to estimate the values 
of X,,,,- For each output vector y; € R? we define a target value Y := y; +7lly;||,v to define 
the cost (9.25), where v € R? is a uniform random vector representing some noise, and 
n € [0,1] is a noise level. We then use a built-in Matlab optimizer with the gradient 
of Proposition 9.8, with initial guess xp := 0 € Rĉ, to obtain an estimate x; of x;. The 
results of the estimate for each output value in the test set are depicted in Figure 9.4 for 
n = 0,0.1, where we report also the final value of the cost function C(x;'’). In all cases, 
the optimizer seems to converge, since the value of the cost function is in all cases 
smaller than 1074, which represents a relative value smaller than 10°? with respect to 
the magnitude of the input values. The maximum absolute error in the estimations 
is quite uniform for all the samples in the test set, and this results in a good relative 
error of about 107! for large inputs, while for inputs of very small magnitude the rel- 
ative error is larger than 1, and a larger noise level leads to less accurate predictions. 
This behavior is coherent with the analysis of the test error discussed above, since the 
approximant is less accurate on inputs of small magnitude, and thus it provides a less 
reliable surrogate in the cost function. 


9.9 Conclusions and outlook 


In this chapter we discussed the use of kernel methods to construct surrogate models 
based on scattered data samples. These methods can be applied to data with general 
structure, and they scale well with the dimension of the input and output values. 
In particular, we analyzed issues and methods to obtain sparse solutions, which are 
then extremely fast to evaluate, while still being very accurate. These properties have 
been further demonstrated on numerical tests on a real application dataset. These 
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1070 


10° 105 
10 10? 10° 10? 10 10? 10° 10? 
Ix Il, IIx Il, 


Figure 9.4: Absolute errors of the input estimation as functions of the magnitude of the output (left), 
and value of the cost function at the estimated input (right) for a noise level 7 = 0 (top row) and 

n = 0.1 (bottom row) using the f-greedy VKOGA model. The dotted lines represent relative error 
levels from 10° to 10°. 


methods can be analyzed in the common framework of Reproducing Kernel Hilbert 
Spaces, which provides solid theoretical foundations and a high flexibility to derive 
new algorithms. 

The integration of machine learning and model reduction is promising and many 
interesting aspects have still to be investigated. For example, surrogate models have 
been used in [23, 24] to learn a representation with respect to projection-based meth- 
ods, and generally a more extensive application of machine learning to dynamical sys- 
tems requires additional understanding and the derivation of new techniques. More- 
over, the field of data-based numerics is very promising, where classical numerical 
methods are integrated or accelerated with data-based models. 


Bibliography 


[1] M. Alvarez, L. Rosasco, and N. D. Lawrence. Kernels for vector-valued functions: a review. 
Found. Trends Mach. Learn., 4(3):195-266, 2012. 
[2] N. Aronszajn. Theory of reproducing kernels. Trans. Am. Math. Soc., 68:337-404, 1950. 


350 — G. Santin and B. Haasdonk 


[3] 


[4] 


[5] 


[6] 


[7] 


[8] 


[9] 


[10] 


[11] 


[12] 


[13] 


[14] 


[15] 


[16] 


[17] 


[18] 


[19] 


[20] 


[21] 


[22] 


[23] 


A. Beckert and H. Wendland. Multivariate interpolation for fluid-structure-interaction problems 
using radial basis functions. Aerosp. Sci. Technol., 5(2):125-134, 2001. 

B. E. Boser, I. M. Guyon, and V. N. Vapnik. A training algorithm for optimal margin classifiers. In 
Proceedings of the Fifth Annual Workshop on Computational Learning Theory, COLT ’92, pages 
144-152. ACM, New York, NY, USA, 1992. 

J. Bouvrie and B. Hamzi. Kernel methods for the approximation of nonlinear systems. SIAM J. 
Control Optim., 55(4):2460-2492, 2017. 

T. Briinnette, G. Santin, and B. Haasdonk. Greedy kernel methods for accelerating implicit 
integrators for parametric ODEs. In F. A. Radu, K. Kumar, |. Berre, J. M. Nordbotten and I. S. Pop, 
editors, Numerical Mathematics and Advanced Applications - ENUMATH 2017, pages 889-896. 
Springer, Cham, 2019. 

R. Cavoretto, A. De Rossi, and E. Perracchione. Efficient computation of partition of 

unity interpolants through a block-based searching technique. Comput. Math. Appl., 
71(12):2568-2584, 2016. 

R. Cavoretto, G. Fasshauer, and M. McCourt. An introduction to the Hilbert-Schmidt SVD using 
iterated Brownian bridge kernels. Numer. Algorithms, 68(2):1-30, 2014. 

C.-C. Chang and C.-J. Lin. LIBSVM: A library for support vector machines. ACM Trans. Intell. 
Syst. Technol., 2:27, 2011. Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm. 

S. Chen, D. Donoho, and M. Saunders. Atomic decomposition by basis pursuit. SIAM J. Sci. 
Comput., 20(1):33-61, 1998. 

W. Chen, Z.-). Fu, and C.-S. Chen. Recent advances in radial basis function collocation methods. 
Springer, 2014. 

S. Deparis, D. Forti, and A. Quarteroni. A rescaled localized Radial Basis Function interpolation 
on non-Cartesian and nonconforming grids. SIAM J. Sci. Comput., 36(6):A2745-A2762, 2014. 
S. Deparis, D. Forti, and A. Quarteroni. A Fluid-Structure Interaction Algorithm Using Radial 
Basis Function Interpolation Between Non-Conforming Interfaces, pages 439-450. Springer, 
Cham, 2016. 

M. Drohmann and K. Carlberg. The ROMES method for statistical modeling of 
Reduced-Order-Model error. SIAM/ASA J. Uncertain. Quantificat., 3(1):116-145, 2015. 

G. E. Fasshauer and M. McCourt. Kernel-Based Approximation Methods Using MATLAB. 
Interdisciplinary Mathematical Sciences, volume 19. World Scientific Publishing Co. Pte. Ltd., 
Hackensack, NJ, 2015. 

G. E. Fasshauer and M. J. McCourt. Stable evaluation of Gaussian radial basis function 
interpolants. SIAM J. Sci. Comput., 34(2):A737-A762, 2012. 

B. Fornberg and N. Flyer. A primer on radial basis functions with applications to the 
geosciences. SIAM, 2015. 

B. Fornberg, E. Larsson, and N. Flyer. Stable computations with Gaussian radial basis 
functions. SIAM J. Sci. Comput., 33(2):869-892, 2011. 

J. Garcke and M. Griebel. Sparse grids and applications, volume 88. Springer, 2012. 

T. Gartner, J. W. Lloyd, and P. A. Flach. Kernels for structured data. In S. Matwin and C. Sammut, 
editors, Inductive Logic Programming, pages 66-83. Springer Berlin Heidelberg, Berlin, 
Heidelberg, 2003. 

|. Goodfellow, Y. Bengio, and A. Courville. Deep Learning. MIT Press, 2016. http://www. 
deeplearningbook.org. 

S. Grundel, N. Hornung, B. Klaassen, P. Benner, and T. Clees. Computing Surrogates for Gas 
Network Simulation Using Model Order Reduction, pages 189-212. Springer New York, New 
York, NY, 2013. 

M. Guo and J. S. Hesthaven. Reduced order modeling for nonlinear structural analysis using 
Gaussian process regression. Comput. Methods Appl. Mech. Eng., 341:807-826, 2018. 


[24] 


[25] 


[26] 


[27] 


[28] 


[29] 


[30] 


[31] 


[32] 


[33] 


[34] 


[35] 


[36] 


[37] 


[38] 
[39] 


[40] 
[41] 
[42] 
[43] 
[44] 


[45] 


9 Kernel methods for surrogate modeling —— 351 


M. Guo and J. S. Hesthaven. Data-driven reduced order modeling for time-dependent problems. 
Comput. Methods Appl. Mech. Eng., 345:75-99, 2019. 

B. Haasdonk and G. Santin. Greedy kernel approximation for sparse surrogate modeling. In 

W. Keiper, A. Milde and S. Volkwein, editors, Reduced-Order Modeling (ROM) for Simulation 
and Optimization: Powerful Algorithms as Key Enablers for Scientific Computing, pages 21-45. 
Springer, Cham, 2018. 

D. Haussler. Convolution kernels on discrete structures. Technical Report UCS-CRL-99-10, UC 
Santa Cruz, 1999. 

G. S. Kimeldorf and G. Wahba. A correspondence between Bayesian estimation on stochastic 
processes and smoothing by splines. Ann. Math. Stat., 41(2):495-502, 1970. 

M. K6ppel, F. Franzelin, |. Kroker, S. Oladyshkin, G. Santin, D. Wittwar, A. Barth, B. Haasdonk, 
W. Nowak, D. Pflüger, and C. Rohde. Comparison of data-driven uncertainty quantification 
methods for a carbon dioxide storage benchmark scenario. Comput. Geosci., 23(2):339-354, 
2019. 

T. Köppl, G. Santin, B. Haasdonk, and R. Helmig. Numerical modelling of a peripheral arterial 
stenosis using dimensionally reduced models and kernel methods. Int. J. Numer. Methods 
Biomed. Eng., 34(8):e3095, 2018. cnm.3095. 

M. Kowalewski, E. Larsson, and A. Heryudono. An adaptive interpolation scheme for molecular 
potential energy surfaces. J. Chem. Phys., 145(8):084104, 2016. 

E. Larsson, E. Lehto, A. Heryudono, and B. Fornberg. Stable computation of differentiation 
matrices and scattered node stencils based on Gaussian radial basis functions. S/AM J. Sci. 
Comput., 35(4):A2096-A2119, 2013. 

A. Manzoni and F. Negri. Heuristic strategies for the approximation of stability factors in 
quadratically nonlinear parametrized PDEs. Adv. Comput. Math., 41(5):1255-1288, 2015. 

E. Marchandise, C. Piret, and J.-F. Remacle. CAD and mesh repair with Radial Basis Functions. J. 
Comput. Phys., 231(5):2376-2387, 2012. 

|. Martini. Reduced Basis Approximation for Heterogeneous Domain Decomposition Problems. 
PhD thesis, IANS, University of Stuttgart 2017. 

C. A. Micchelli and M. Pontil. On learning vector-valued functions. Neural Comput., 
17(4):177-204, 2005. 

S. Miller. Komplexität und Stabilität von kernbasierten Rekonstruktionsmethoden (Complexity 
and Stability of Kernel-based Reconstructions). PhD thesis, Fakultät fiir Mathematik und 
Informatik, Georg-August-Universitat Göttingen 2009. 

S. Müller and R. Schaback. A Newton basis for kernel spaces. J. Approx. Theory, 
161(2):645-655, 2009. 

R. A. Olea. Geostatistics for engineers and earth scientists. Springer, 2012. 

M. Pazouki and R. Schaback. Bases for kernel-based spaces. J. Comput. Appl. Math., 
236(4):575-588, 2011. 

B. Peherstorfer and Y. Marzouk. A transport-based multifidelity preconditioner for Markov 
chain Monte Carlo. Adv. Comput. Math., 45(5):2321-2348, 2019. 

J. Platt. Sequential minimal optimization: A fast algorithm for training support vector 
machines. Technical report, April 1998. 

C. E. Rasmussen and C. K. |. Williams. Gaussian Processes for Machine Learning. The MIT 
Press, 2006. 

C. Rieger and B. Zwicknagl. Sampling inequalities for infinitely smooth functions, with 
applications to interpolation and machine learning. Adv. Comput. Math., 32(1):103-129, 2008. 
C. Rieger and B. Zwicknagl. Deterministic error analysis of support vector regression and 
related regularized kernel methods. J. Mach. Learn. Res., 10:2115-2132, 2009. 

S. Saitoh and Y. Sawano. Theory of Reproducing Kernels and Applications. Developments in 
Mathematics, volume 44. Springer, Singapore, 2016. 


352 — G. Santin and B. Haasdonk 


[46] 


[47] 


[48] 


[49] 


[50] 


[51] 


[52] 


[53] 


[54] 


[55] 
[56] 


[57] 


[58] 


[59] 


[60] 


[61] 


[62] 
[63] 


[64] 


[65] 


[66] 


[67] 


[68] 


G. Santin and B. Haasdonk. Convergence rate of the data-independent P-greedy algorithm in 
kernel-based approximation. Dolomites Res. Notes Approx., 10:68-78, 2017. 

G. Santin, D. Wittwar, and B. Haasdonk. Greedy regularized kernel interpolation/ University of 
Stuttgart, 2018. ArXiv preprint 1807.09575. 

R. Schaback. Error estimates and condition numbers for radial basis function interpolation. 
Adv. Comput. Math., 3(3):251-264, 1995. 

R. Schaback and H. Wendland. Approximation by positive definite kernels. In M. Buhmann and 
D. Mache, editors, Advanced Problems in Constructive Approximation. International Series in 
Numerical Mathematics, volume 142, pages 203-221. 2002. 

M. Scheuerer, R. Schaback, and M. Schlather. Interpolation of spatial data — a stochastic ora 
deterministic problem? Eur. J. Appl. Math., 24(4):601--629, 2013. 

A. Schmidt and B. Haasdonk. Data-driven surrogates of value functions and applications to 
feedback control for dynamical systems. /FAC-PapersOnLine, 51(2):307-312, 2018. 9th Vienna 
International Conference on Mathematical Modelling. 

B. Schélkopf, R. Herbrich, and A. J. Smola. A generalized representer theorem. In D. Helmbold 
and B. Williamson, editors, Computational Learning Theory, pages 416-426. Springer Berlin 
Heidelberg, Berlin, Heidelberg, 2001. 

B. Schélkopf and A. Smola. Learning with Kernels. The MIT Press, 2002. 

J. Shawe-Taylor and N. Cristianini. Kernel Methods for Pattern Analysis. Cambridge University 
Press, 2004. 

|. Steinwart and A. Christmann. Support Vector Machines. Springer, 2008. 

|. Steinwart, D. Hush, and C. Scovel. An explicit description of the Reproducing Kernel Hilbert 
Spaces of Gaussian RBF kernels. JEEE Trans. Inf. Theory, 52(10):4635-4643, 2006. 

|. Steinwart, D. Hush, and C. Scovel. Training SVMs Without Offset. J. Mach. Learn. Res., 
12:141-202, 2011. 

|. Steinwart and P. Thomann. liquidSVM: A fast and versatile SVM package, 2017. 
arXiv:1702.06899. 

J. Suykens, J. Vanderwalle, and B. D. Moor. Optimal control by least squares support vector 
machines. Neural Netw., 14:23-35, 2001. 

T. Taddei, J. D. Penn, M. Yano, and A. T. Patera. Simulation-based classification; a 
model-order-reduction approach for structural health monitoring. Arch. Comput. Methods Eng., 
1-23, 2016. 

R. Tibshirani. Regression shrinkage and selection via the LASSO. J. R. Stat. Soc. B, 
58(1):267-288, 1996. 

V. Vapnik. Statistical Learning Theory. John Wiley & Sons, New York, 1998. 

H. Wendland. Piecewise polynomial, positive definite and compactly supported radial 
functions of minimal degree. Adv. Comput. Math., 4(1):389-396, 1995. 

H. Wendland. Fast evaluation of radial basis functions: methods based on partition of unity. In 
Approximation theory, X (St. Louis, MO, 2001). Innov. Appl. Math., pages 473-483 Vanderbilt 
Univ. Press, Nashville, TN, 2002. 

H. Wendland. Scattered Data Approximation. Cambridge Monographs on Applied and 
Computational Mathematics, volume 17. Cambridge University Press, Cambridge, 2005. 

H. Wendland and C. Rieger. Approximate interpolation with applications to selecting 
smoothing parameters. Numer. Math., 101(4):729-748, 2005. 

D. Wirtz and B. Haasdonk. A-posteriori error estimation for parameterized kernel-based 
systems. In Proc. MATHMOD 2012 - 7th Vienna International Conference on Mathematical 
Modelling, 2012. 

D. Wirtz and B. Haasdonk. A vectorial kernel orthogonal greedy algorithm. Dolomites Res. 
Notes Approx., 6:83—100, 2013. 


9 Kernel methods for surrogate modeling —— 353 


[69] D. Wirtz, N. Karajan, and B. Haasdonk. Surrogate modelling of multiscale models using kernel 
methods. /nt. J. Numer. Methods Eng., 101(1):1-28, 2015. 

[70] D. Wittwar, G. Santin, and B. Haasdonk. Interpolation with uncoupled separable matrix-valued 
kernels. Dolomites Res. Notes Approx., 11:23-29, 2018. 

[71] H. Zhang Cand L. Zhao. On the inclusion relation of reproducing kernel Hilbert spaces. Anal. 
Appl., 11, 2013. 


Jack P. C. Kleijnen 


10 Kriging: methods and applications 


Abstract: In this chapter we present Kriging—also known as a Gaussian process (GP) 
model—which is a relatively simple metamodel—or emulator or surrogate—of the 
corresponding complex simulation model. To select the input combinations to be 
simulated, we use Latin hypercube sampling (LHS); these combinations may have 
uniform and non-uniform distributions. Besides deterministic simulation we discuss 
random—or stochastic—simulation, which requires adjusting the design and analy- 
sis. We discuss sensitivity analysis of simulation models, using “functional analysis 
of variance” (FANOVA)—also known as Sobol sensitivity indices. Finally, we discuss 
optimization of the simulated system, including “robust” optimization. 


Keywords: Gaussian process, metamodel, emulator, surrogate, optimization 


10.1 Introduction 


Kriging is the mathematical interpolation method that is named after the South 
African mining-engineer Krige (who lived from 1919 through 2013). He solved the 
problem of interpolating the outputs (or responses) that were obtained at a limited 
number of locations for gold mining; see the details on Krige’s life in [32]. 

Next, Krige’s method was formalized by the French mathematician Matheron 
(1930-2000), who developed a novel type of mathematical statistics—called geo- 
statistics or spatial statistics. He based this formalization on the stationary Gaussian 
process (GP). This stationarity implies that the GP has a constant mean (expected 
value), a constant variance, and covariances that depend only on the distances be- 
tween “points” in a (say) k-dimensional space; obviously, in spatial statistics k < 3 
(length, width, height). This GP defines a multivariate normal (or Gaussian) distribu- 
tion. Spatial statistics is detailed in [14], which is a popular textbook with 900 pages; 
other books that reflect the French tradition are [12] and [40]. Recent survey articles 
are [4] and [13]. The connection between Krige and Materon is also discussed in [32]. 

Later on, these GPs were applied in machine learning, which is a “hot” subdisci- 
pline within computer science. The best-known textbook on GPs in machine learning 
is [35]. 

However, in this chapter we focus on the development and application of GPs in 
experiments with computerized simulation models; this field is known as design and 
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analysis of computer experiments (DACE). Obviously, these models may have many in- 
puts, which implies k > 1. The pioneering article on DACE is [36]; a recent textbook is 
[38]. DACE publications focus on deterministic simulation, whereas we shall also dis- 
cuss random or stochastic simulation (e. g., queueing simulation models). A random 
simulation uses pseudo-random numbers (PRNs), which by definition are uniformly 
distributed on [0,1] and mutually independent. The design and analysis of experi- 
ments with random simulation models require the development of novel statistical 
methods, such as methods for sample-size determination and stochastic Kriging (SK). 
Indeed, we need to determine these sample sizes because we should select the number 
of replications per simulated point (input combination) in order to control the noise 
created by the PRNs (in Section 10.4 we shall see that this problem is not yet solved 
satisfactorily). We need SK because we should account for the so-called “intrinsic” 
noise created by the PRNs. The pioneering article on SK is [1]. 

We call a GP a metamodel of the underlying simulation model; i. e., this meta- 
model is a simpler and explicit mathematical function that approximates the complex 
and implicit function defined by the simulation model (this model is either determin- 
istic or random). The DACE literature often calls the metamodel a surrogate or an emu- 
lator. We shall see (in the next section) that Kriging also quantifies the uncertainty of 
its predictor (say) f; i. e., Kriging also gives Var(V), whereas many other metamodels 
(e. g., neural nets, splines) do not. 

Kriging may have different goals; see [22, p.9]. We shall discuss prediction, sen- 
sitivity analysis, and optimization. Besides these goals, [2] also discusses uncertainty 
quantification and uncertainty propagation. 

Note: Simulation is applied in many scientific disciplines, which have their own 
terminologies and mathematical symbols. We use the terminology and symbols in [22]; 
e.g., we write Gaussian and Kriging with capitals (because these words refer to the 
proper names Gauss and Krige), and we use the symbol k (instead of d, which is used 
in many other publications on GP). The hasty reader may skip paragraphs that start 
with “Note:”. 

Note: [30, 29] consider so-called intrinsic Kriging (IK)—which originated in geostatistics— 
and derive several IK types for deterministic simulations and random simulations, 
respectively. 

We base this chapter on the Chapters 5 and 6 in [22], but we update these chapters, 
adding novel methods and applications for Kriging. Furthermore, corrections and ad- 
ditions for [22] are available on https://sites.google.com/site/kleijnenjackpc/home/ 
publications/corrections-additions-of-2015-springer-book. 

Besides [22] we also use [24, 25]. We do not use a Bayesian approach, which is used 
in many publications on GP. 

We organize the rest of this chapter as follows. In Section 10.2 we present so-called 
“Ordinary Kriging” (OK), comparing OK with popular linear-regression metamodels. 
In Section 10.3 we present Latin hypercube sampling (LHS) for selecting the input com- 
binations to be simulated, which results in the input/output (I/O) data analyzed by 
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OK. In Section 10.4 we consider the adjustments in the design and analysis of random 
simulation when using Kriging. In Section 10.5 we discuss sensitivity analysis (SA) of 
simulation models that are analyzed through Kriging; this SA may be global instead 
of local, and use “functional analysis of variance” (FANOVA)—also known as Sobol 
sensitivity indices. In Section 10.6 we discuss optimization of the simulated system 
including “robust” optimization, using Kriging. In Section 10.7 we present our con- 
clusions, including topics that require further research. 


10.2 Ordinary Kriging 


To explain OK, we start with the following linear-regression model (we assume that 
the readers are familiar with this model; otherwise, they can read Chapters 2 and 3 in 
[22]); we add the subscript “reg”, to distinguish between regression and OK models: 


Yreg = XregBreg + €reg (10.1) 


where Y;e denotes the n-dimensional vector with the observations on the dependent 
(explained) regression variable where n denotes the number of observed (simulated) 
input combinations or “points”, X,eg is the n x q matrix of independent (explanatory) 
regression variables, eg is the q-dimensional vector with regression parameters (co- 
efficients), and €,., is the n-dimensional vector with the residuals E(y,.,) - E(w) where 
w is the n-dimensional vector with simulated outputs. X,eg consists of the n rows with 
the q-dimensional vectors X; with i = 1, ..., n. We assume a univariate (scalar) out- 
put. The simulation model has k inputs x; (j = 1,...,k); an independent regression 
variable may be identical to a simulation input or it may be a function of one or more 
simulation inputs; e. g., Xyeg.2 = xe OF Xreg:3 = X1X2. Classic regression analysis assumes 
that e,., is white noise; i. e., €,.g is normally (Gaussian) distributed with zero means, 
constant variances (say) o°, and zero correlations so the covariance matrix of Crog is 
reg = O7T nxn Where I,,., denotes the n x n identity matrix. 

Because white noise implies independent residuals, we cannot learn from the 
residuals. Kriging, however, assumes that the residuals at two points (say) x and x’ 
have values that are more similar, as x and x’ are closer; i.e., Kriging assumes that the 
residuals are positively correlated. 

The simplest Kriging model is the OK model 


y(x) =u + M(x) (10.2) 


where p is the constant mean E[y(x)] and M(x) is a zero-mean stationary GP (a more 
complicated type of Kriging may replace this u by a low-order polynomial; see uni- 
versal Kriging or UK, discussed in the Note at the end of this section). M(x) is called 
the extrinsic noise, because the term “intrinsic noise” is used for random simulation 
analyzed through SK. 
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The OK model leads to the linear predictor (say) (Xo) for the new point Xo = (Xo,) 
that combines the n old outputs collected in w,,—or briefly w—that are observed at the 
n old points with k inputs, so the n x k matrix X has the rows x; = (Xj1,...,X;.) with 
i=1,...,n and y(Xq) uses the n weights À; collected in the vector A: 


n 
V(X) = X Aw(x;) = A’w. (10.3) 
i=l 
To derive the optimal A, we use the best linear unbiased predictor (BLUP) criterion. By 
definition, the predictor y(x) is unbiased if E[y(x)] = E[y(x)]. This implies that x = x; 
gives y(x;) = w(x;);i. e., V(X) is an exact interpolator. Such interpolation makes perfect 
sense in deterministic simulation. 

Note: A regression model determines the optimal B,.. in (10.1) through the crite- 
rion of the best linear unbiased estimator (BLUE) of B,.. where “best means “mini- 
mum variance”; this BLUE is Brg = (KlegXreg) XfegW- The BLUE ĝ,eg is identical to 
the “maximum likelihood” (ML) estimator and the “least squares” (LS) estimator. LS 
is a mathematical instead ofa statistical criterion; instead of this L,-norm, some math- 
ematical models use either the L,-norm or the L,,-norm. The BLUE is not always equal 
to the MLE; e. g., the BLUE of o° has the denominator n — 1, whereas the MLE has n. 
Obviously, Y,og(X) = Xl ooBreg is not am exact interpolator—unless n = q. 

Furthermore, we can prove that the optimal A’ in (10.3) is 


1-155! 0(x,)]! 
a (Xo) ri (10.4) 
1X341 


A, = | Ou(Xo) +1 
where Xy denotes the n x n matrix with the covariances between the metamodel’s 
“old” outputs y; (so, Ey = (Op) = (Cov(y;,y;-)) with i,i’ = 1,...,n), and Oy(Xo) de- 
notes the n-dimensional vector with the covariances between the metamodel’s new 
output yo and the n old outputs y; (So, O m (Xo) = (09) = (Cov(V¥o, y;))). Obviously, Xy is 
determined by the old I/O simulation data (X, w), whereas 0;(X,) varies with xX (so 
we might write A,(x,) if we would want to point out that A, varies with xp, whereas 
Breg remains constant). Furthermore, a stationary process implies that A; decreases 
with the distance between Xo and x;. Substituting A,—defined in (10.4)—into (10.3), 
and using 1,,—to denote the n-dimensional vector with all elements equal to 1—gives 


(Xo) = H + Oy (Xo)! E (W - pln). (10.5) 


To denote Var(y;) (= 0; = 07 = 0°), we use the symbol 7° (t? is the more usual symbol 
in the Kriging literature). Then the mean squared residual (MSE) of y(Xo) is 


[1-11 2), 0 (Xo)] 


z (10.6) 
LEH, 


MSE[P(xo)] = T? - Om (Xo)!Zjy Oy (Xo) + 


Because y(X,) is unbiased, MSE [y(x,)] reduces to Var[y(Xp)]. If Xo = x;, then y(Xp) = 
w(X,) and Var[V(X,)] = 0. 
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It is often convenient to switch from covariances to correlations. The correlation 
matrix R = (p;p) equals T *Zy, and the correlation vector p(xX9) equals T*04(Xo). 
There are several types of correlation functions; see (e. g.) [35, pp. 80-104]. In simula- 
tion, the most popular function is the Gaussian correlation function: 


k k 

p(h, 0) = I] exp(-6;h*) = exp(- X ait with 6; > 0 (10.7) 
j=l j=l 

with distance vector h = (h;) where hj = Xaj — Xgr;jl and g,g’ = 0,1,...,n, and with 

0 = (6;) so R = R(@). We collect the 2 + k Kriging (hyper)parameters in = (p, T’, O'y. 

We estimate p through the maximum likelihood (ML) criterion, which gives the ML esti- 

mator (MLE) p. To compute p, it is convenient to switch to the log-likelihood function: 


tin In[|r°R(0)|] +(w- u1n)'[TR(0)] "(w -1,) with@>0 (10.8) 


where |R| denotes the determinant of R. Solving (10.8) is a mathematical challenge; 
e. g., different solutions p may result from different software packages or from initial- 
izing the same package with different starting values; see [18]. The various software 
packages for Kriging standardize the inputs such that the k inputs are limited to a 
k-dimensional hypercube [0, 1]*. 

In practice, p is plugged into (10.5) and (10.6), which gives Ẹ(Xọ, p) and Var[¥(Xp, 
p)]. Most publications ignore the fact that y(Xo, p) becomes nonlinear, and Var[V(Xo, 
p)] underestimates the “true” Kriging variance. However, the true Kriging variance 
is estimated in [22, pp. 191-197]—and also in [11]—applying the bootstrap method and 
the related method called “conditional simulation”. Nevertheless, in this chapter we 
shall simply plug-in p, and we shall not explicitly display the dependence of y and 
Var) on w. This gives 


[1-1 Gy (Xp) 


ee (10.9) 
VEy1 


8 [9(Xp)] = #2 - Gy (Ko)! Ey Gy (Xo) + 


We combine j(x,) and s”[j(Xq)] in the following two-sided confidence interval (CI) with 
nominal coverage 1-a where z, /2 denotes a/2-quantile of the standard normal N(0, 1): 


V(X) + 2g/28[Y(Xo)].- (10.10) 


The actual (true) coverage of this CI may be lower than 1 - a, because of the following 
three factors: (i) the plug-in predictor (Xo) is biased; (ii) the plug-in variance estima- 
tor s? [¥(Xo)] underestimates Var[V(xX,)]; and (iii) the absolute value of the Gaussian 
quantile |z,/>| is lower than the absolute value of the Student quantile with (say) f de- 
grees of freedom |t,q/2| iff < 00, where tfa; with the proper (but unknown) f seems 
to be the correct factor for a CI that uses an estimated variance. 

Note: If we replace the constant u = E(y) bya trend (e. g., E(y) = B’x), then we get 
UK; details on UK are found in [22, pp. 197-198] and also in [9] and [33]. 
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10.3 Latin hypercube sampling 


LHS is a popular type of space-filling design. Other types are orthogonal array, uni- 
form, maximum entropy, minimax, maximin, integrated mean squared prediction er- 
ror, and “optimal” designs; see [22, p. 198]. Furthermore, [3] discusses so-called bridge 
designs, which are space-filling and may satisfy various criteria and input constraints 
such that the input space is not a k-dimensional cube. Finally, [9] shows that “there 
is substantial variation in prediction accuracy over equivalent designs”. 

Originally, LHS was invented as an alternative for Monte Carlo sampling in risk 
analysis or uncertainty analysis through deterministic simulation models that have 
(random) uncertain inputs; risk analysis estimates the probability of the output ex- 
ceeding a given threshold as a function of an uncertain input x;; for details on risk 
analysis we refer to [22, pp. 218-222] and [2]. LHS assumes that an adequate meta- 
model is more complicated than a low-order polynomial, but LHS does not assume 
a specific metamodel (e. g., a Kriging model). LHS usually assumes that the k inputs 
are independently distributed (so their joint distribution is the product of the k in- 
dividual marginal distributions); in this chapter we also use this assumption. Often 
LHS assumes that these distributions are uniform (symbol U) in the interval [0, 1], so 
Xj ~ U(0, 1). In risk analysis, however, LHS often assumes a specific non-uniform dis- 
tribution for x; with its mode at xo, where xo; denotes the jth coordinate of xp and 
j = 1,...,k (standardization implies O < x < 1). This mode may be the value of the 
input that the experts think is most likely. There are many non-uniform distributions; 
see [26, pp. 286-305]. For example, we may use beta distributions, provided we select 
the correct values for the two parameters of the beta distribution; moreover, a different 
combination of these parameters gives a different variation around the mode; see [26, 
pp. 295-297]. We discuss a special case of the beta distributions; namely, a triangular 
distribution with its mode at Xmj3 We denote this distribution by TXm;j)- We shall de- 
tail LHS for U(0, 1) and for T(Xm;) later on in this section. More details on LHS can be 
found in [22, pp. 198-203]; recent algorithms are detailed in [16, 27]. 

Whatever the marginal distributions are, LHS with a sample size (number of input 
combinations) n defines n mutually exclusive and exhaustive subintervals (or classes) 
with equal probability (namely, 1/n) for x; with j = 1,...,k. We denote these subinter- 
vals by Hass he] with g = 1,...,n; the standardization 0 < x; < 1 implies lj = 0 and 
hnj = 1. Altogether, if F; denotes the cumulative distribution function (CDF) of Xj, then 


AN 
Pligg < Xj < hey) = Filltgy) - Fy(leg) = A(é ) A(é - ) =- (101) 


where min(g -1)/n = (1-1)/n = 0, so Fi((g - 1)/n) = F,(0) = 0 because min x; = 0; like- 
wise, max(g)/n = n/n = 1, so F;(g/n) = Fj) = 1 because max; x = 1. Obviously, (10.11) 
implies that near the mode of T(x,,;) the subintervals [l,.;, hg] are relatively short, 
compared with the other subintervals; however, if x; ~ U(0, 1), then each interval has 
the same length 1/n. 
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LHS offers the following two options. Option (i) fixes x; to the n midpoints (sym- 
bol m;) of its n subintervals, so Xgyj = Mgy = (ly + hg)/2; e. g., ifx ~ U(O, 1), then these 
midpoints are equispaced with distance 1/n over the interval [0, 1] so these midpoints 
are 1/(2n), 3/(2n),...,1-1/(2n) = (2n-1)/(2n). Option (ii) samples Xj within its subinter- 
val, accounting for Fj. Option (i) implies that x; has a discrete PDF, so (10.11) becomes 


P(Xgj = Mgy) = Fi(hgy) - Fle) = 7(£) 7(S =) F E, (10.12) 
Furthermore, LHS samples without replacement, so the midpoint m;. is sampled only 
once in the sample of size n. We denote the inverse CDF by Bs so y = F(x) with 0 < 
y < 1implies x = F;"(y); we observe that U(0, 1) and T(x,,) imply that F; is continuous. 
Altogether, x; is a permutation of the n values F;1(0.5/n), F;*(1.5/n), eo Ftd -0.5/n). 
Option (ii) first samples r ~ U(a, b) with a = (g — 1)/n and b = g/n where g = 1,...,n, 
and then computes 


x=F 1) withr ~ (EZ, 2), (10.13) 
n n 
Algorithm 10.1 is a (pseudo)algorithm for LHS for option (i), which gives the n x k 
design matrix X; (the subscript L stands for LHS); also see [22, pp. 200]. 


Algorithm 10.1: 

1. Readn,k,F; (j =1,...,k). 

2. Initialize: j = 1. 

3. Use F; to divide the range of x; into n mutually exclusive and exhaustive in- 
tervals of equal probability with midpoint m; (g = 1,...,n), and find x; = 
(Mj Mj.2,...,Mjn)'« 

. Randomly permute the n elements of x;, and save the result as column j of X}. 

5. Ifj<kthenj =j+1and go to Step 3; else stop. 


For option (ii), Step 3 becomes: Use F; to divide the range of x; into n mutually ex- 
clusive and exhaustive intervals of equal probability, and apply (10.13) to find x; = 
GX- -Xn 

For both options, however, the random permutations in Step 4 may give a “bad” 
X,; to decide on a “good” X,, our algorithm needs a criterion. We use the max- 
imin criterion, which maximizes the minimum Euclidean distance between the n 
(k-dimensional) points in [0,1]. We perform these random permutations (say) M 
times, and select the “best” design; e. g., MATLAB’s default is M = 5. 

LHS does not impose a strict mathematical relationship between n and k. We ob- 
serve that in Kriging n > 10 implies that we can estimate the correlation parameters 6; 
with j = 1,...,k (see (10.7)) reasonably accurate, because LHS implies that projection 
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of the n (k-dimensional) points onto the k individual axes gives n non-collapsing val- 
ues per axis. If LHS uses a “small” n and a “large” k, then LHS covers [0, 1]* sparsely 
(so there are only a few old points close to the new point) and the Kriging predictor 
may be inaccurate. We note that [28] gives the rule-of-thumb n = 10k for LHS if Kriging 
has SA as its goal; so, n > 10 if k > 1; [28] is revisited in [19]. 

There is much software for LHS. For example, Microsoft’s Excel spreadsheet soft- 
ware has add-ins that include LHS; LHS is also included in Oracle’s Crystal Ball, Pal- 
isade’s @Risk, and Frontline Systems’ Risk Solver. LHS is also available in the MATLAB 
Statistics toolbox, the R package, and Sandia’s DAKOTA software. An interesting web- 
site is http://www.spacefillingdesigns.nl/. However, some software (e. g., MATLAB) 
does not allow a non-uniform distribution such as T(x,,), so we now present LHS for 
U(0, 1) and T(x,,); for details we refer to [24]. 

The CDF of U(0, 1) is Fy,(x) = x if0 < x < 1. This CDF and (10.12) imply that option 
(i) samples U(0, 1) through 


Xj.g = Mg, if le.j 


This CDF and (10.13) imply that option (ii) samples U(0, 1) through the PRN r ~ U(0, 1) 
and computes 


< Xgj < hgj (g =1,...,N). 


Xj;g = lei $ r(hoj A 1) if lee < Xoj < hg, (g = 1, ees n). 
The CDF of T(x,,) is (see [26, pp. 304-305]): 


2 
Frj) = — if O < X < Xm; (10.14) 
m;J 


(1-x)? 


Fry (x) = 1- ifXmy SX <1, 
m;j 


so this CDF has a kink at x,,;. Combining this equation with (10.12), option (i) samples 


Xm -D 
Xjg = oe with xX, < Xj (§ =1,...,7), (10.15) 
(1-Xmj 2n- (28-D]) 
Xjg=1 | T with Xjg = Xm;- 


Option (ii) samples x; (within the specific subinterval) via the first line of (10.14) ifh,,; < 
Xm If le, > Xm,» then it samples x; via the second line of (10.14). If lj < Xmy < hgj 
then it first samples the PRN r ~ U(0, 1); ifr < xj, then it samples x; via the first line 
of (10.14); ifr > x,,.;, then it samples x; via the second line of (10.14). Altogether, option 
(ii) samples 


m;j? 


Xjig = \/Xm;j" if either Xm; > hg;j Orley < Xmy < hgj andr < Xm;j (10.16) 
Xg =1- ya -Xm;j)(1-r) ifeither Xm; < lg; OF leg < Xm;j < hgj andr > Xm;j 


Both options are compared in [24]. 
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We observe that [23] develops a method (or algorithm) that uses a sequential de- 
sign for Kriging in deterministic simulation aimed at SA. That method considers—but 
does not yet simulate—a set of candidate combinations selected through LHS, and 
finds the candidate with the highest estimated predictor variance. That method is 
also summarized in [22, pp. 204-206]. For constrained optimization in deterministic 
simulation, [25] also uses Kriging and a sequential design based on LHS for selecting 
candidate combinations, and finds the candidate with the highest criterion value; see 
Section 10.6.1. 


10.4 Random simulation: Kriging analysis and 
experimental design 


In random simulation we wish to control the noise of the simulation output at point 
x; (i = 1,...,), so we obtain (say) m; = 1 replications; we shall return to the choice of 
a value for m;. To analyze the I/O data of a random simulation, [1] develops stochastic 
Kriging (SK). This SK adds the intrinsic noise term ¢,(x;) ~ N(O, Var[e(x;)]) for repli- 
cation r (r = 1,...,m,). After averaging over the m; replications, SK uses the formulas 
for the OK predictor Y in (10.5) and Var(y) in (10.6), but replaces w by w and M(x;) by 
M(x;) + €(x;) where €(x;) ~ N(0, Var[e(x;)]/m;) and €(x,) is assumed to be indepen- 
dent of M(x). We use the symbol w,,. to denote y augmented with Var[e(x;)]. The SK 
predictor y(w, .) is not an exact interpolator anymore—which makes sense in random 
simulation, which gives only estimates of the true simulation outputs. Obviously, X is 
diagonal if no common random numbers (CRN) are used (if we applied CRN and used 
m; = m, then Z; = Z,/m; however, we assume no CRN in this section). To estimate 
Var[e(x;)], SK may use the classic unbiased estimator 


mi —s2 
Da Wir = wi) 


(i=1,...,n). (10.17) 
Mm; -1 


s*(w;) = 
However, these s?(w;) are rather noisy estimators, so SK may use a second Kriging 
metamodel for Var[e(x;)]—besides the Kriging metamodel for the mean E[y(x;)]. This 
second metamodel is only a rough approximation, because s?(w;) is not normally dis- 
tributed. The transformation log[s*(w;)] may give a normal distribution. For more de- 
tails we refer to [22, p. 208]. 

An alternative for SK is hetGP developed in [5]. This alternative assumes m; > 1, 
whereas SK assumes m; > 1. Whereas SK gives a biased w,, because SK fits Kriging 
models for the mean and the intrinsic variances independently, hetGP couples these 
models through a joint likelihood for p, that is optimized in one shot. This alternative 
requires computational time of the same order as SK does. A recent alternative for SK 
is developed in [45]. 
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The output of a random simulation may be a quantile (not an average); e.g., a 
quantile may be relevant in chance-constrained optimization. References are given in 
[22, p. 208]. 

In our preceding discussion of Kriging, we assumed m; > 1 replications at the 
point x; withi = 1,...,n.In practice, we must decide how many points and which points 
to simulate, and how many replications to obtain for these points. Unfortunately, there 
are no simple solutions for this problem. Actually, [6] and [41] present sequential de- 
signs using the criterion called the integrated mean squared prediction error (IMSPE), 
which integrates Var[y(x)] for x in [0, 1). Obviously, this criterion assumes that the 
goal of Kriging is SA. Furthermore, [42] extends [5], using advanced mathematical 
analysis. If the goal of Kriging is optimization, then other criteria may be used (see 
Section 10.6). The main problem is that m; with i = 1,...,n should depend on both M 
(external noise) and Z; (internal noise); also see [30]. 

Note: We observe that [10] allows some inputs to be qualitative. Furthermore, [47] 
uses SK, accounting for the discrepancy between real observations and simulated ob- 
servations, possibly using CRN. Finally, [37] uses “generalized integrated Brownian 
fields” as simulation metamodels. 


10.5 Sensitivity analysis 


SA may be one of the goals of simulation modeling and Kriging. SA may be either 
global or local. SA is also closely related to “what if analysis”, “gaining insight” , and 
“prediction” . So, in practice, goals may be known under different names, and may be 
ambiguous. For further discussion of various goals we refer to [22, p. 9]. 

In this chapter we give the following definition: SA quantifies how the simulation 
output changes, as one or more simulation inputs change. To start our discussion of 
SA, we assume that the simulation is deterministic and that we change only one of the 
k inputs and that this single input (say) x, is continuous. Then we can quantify this 
sensitivity at a given point x—so the SA is local—through ow/0x,|,. This local SA is 
simplest if we can adequately approximate the I/O behavior of the simulation model 
by a first-order polynomial in x so 0w/0x,|, = B;. If we use a second-order polynomial, 
then interactions between x, and the other k — 1 inputs play a role and the marginal 
effect of x, is not constant. 

Instead of local SA we may perform global SA: how does the simulation output 
change, as one or more simulation inputs change over the whole area of interest 
[0, ak ? Moreover, “the” output may have a distribution (instead of a single value) if 
the deterministic simulation has one or more inputs that are uncertain so we assume 
a prespecified distribution for these uncertain inputs (as in LHS; see Section 10.3). 
Now we discuss global sensitivity analysis (GSA) or functional analysis of variance 
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(FANOVA), which uses variance-based indices originally proposed by the Russian 
mathematician Sobol. 

FANOVA decomposes o2 —the variance of the random simulation output w—into 
fractions that refer to sets of inputs: 


k 
o, = x T+ YO to +O, (10.18) 


with the main-effect variance oF = Var[E(w|x;)], the two-factor interaction variance 
Oi = Var[E(w|x;, xp )]-Var[E(w|x;)] - Var[E(wlx;)], etc. This oF gives the first-order sen- 
sitivity index or the main-effect index ¢; = oF / o2, which quantifies the effect of varying 
x; alone—averaged over the variations in all the other k — 1 inputs—where the denom- 
inator on standardizes ¢; to provide a fractional contribution. Altogether the sum of 
the 2‘! indices is 1. In practice, we assume that only the ¢;—and possibly the ¢, —are 
important, and that they sum up to a fraction “close enough” to 1. 

To estimate these measures, we may use LHS and replace the simulation model by 
a Kriging metamodel; see [22, pp. 216—218]. In practice, FANOVA may show that (say) 
70 % of ož, is caused by x4, 20 % by x,, and 10 % by the interaction between x, and x3. 
Software for FANOVA is included in Open TURNS, discussed in [2]; a MATLAB toolbox 
is presented in [34]. SA is the topic of many recent publications; see the website with 
corrections and additions for [22] that was mentioned in Section 10.1. 


10.6 Optimization 


There are many methods for the optimization of a simulated system; see the references 
in [22, p. 242]. Some methods require relatively many input combinations; e. g., evolu- 
tionary algorithms (EAs) and particle swarm optimization (PSO) do. We may therefore 
apply these methods to a computationally cheap metamodel of the underlying compu- 
tationally expensive simulation model; e. g., PSO and Kriging are combined in [20]. In 
this chapter, however, we focus on efficient global optimization (EGO) in Section 10.6.1 
and robust optimization (RO) in Section 10.6.2. 


10.6.1 Efficient global optimization 


Originally, [21] developed EGO, which uses Kriging. EGO is a sequential method that 
balances local and global search; i.e., EGO balances exploitation and exploration. We 
detail only the basic EGO-variant for deterministic simulation. The goal of this variant 
is to estimate the input combination x, that minimizes the simulation output w(x). 
We start with an initial or pilot sample of input combinations x; with i = 1,...,n, 
selected through LHS. We use these x; as input for the simulation model, which 
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gives (X; w;) or (X,,W,,). We then find the best simulated output so far: Wmin = 
mini <icn w(X;). 

Next we select a new input combination, considering both y and s(y); e. g., if the 
two combinations x and x’ have f(x) = y(x’) and s[¥(x)] > s[V(x’)], then we prefer x 
because x has a higher probability of improvement (or PI) so it may give a smaller w. 
We know that s[y(x,.,,)] increases as x,,,, lies farther away from x; (i = 1,..., n); see 
(10.6). Actually, we estimate the maximum of the expected improvement (EI), which is 
reached if either y(x,,,,) is much smaller than w,,;, Or S[V(X,41)] is relatively large so 
Y(Xn+1) is relatively uncertain. 

To obtain y and s(y), we fit a Kriging metamodel. This gives EI(x) = E[max (Win — 
P(x), 0)]. Let D and ġ denote the cumulative distribution function (CDF) and the PDF 
of the standard normal variable z. Then [21] derives that the estimated EI for the input 
combination x is 


TO = (Win - 70) { a FE), spojom -72 ), 40.9) 


Using (10.19), we find the x that maximizes E(x); we denote this optimal combination 
by X, (with the subscript “o” for “optimal”). To find this X,, we may use a relatively 
large set of candidate input combinations that are selected through LHS (say) Xang3 
we do not simulate these candidates, but we find the candidate with the highest EI(x) 
with X € Xana: 

We use this candidate X, as the input combination that is simulated next, and 
obtain w(X,). Then we re-estimate the Kriging metamodel from the I/O data (X,,, w,,) 
augmented with (X,, w(X,)) . We update n, and return to (10.19)—until we satisfy a 
stopping criterion; e. g., El opt) is “close” to O or the computer budget is exhausted. 

There are many more EGO-variants, for deterministic simulation and random 
simulation, constrained optimization, multi-objective optimization including Pareto 
frontiers, RO, the “excursion set” or “admissible set”, estimation of a quantile, and 
Bayesian approaches; see [22, pp. 267-269]. For example, a variant for constrained op- 
timization in deterministic simulation—with one goal output and several constrained 
outputs—is developed in [25]. 


10.6.2 Robust optimization 


In [15] Krige’s (meta)model and Taguchi’s world view are combined. Taguchi de- 
signed robust engineering products such as cars (at Toyota), emphasizing that in 
practice some inputs are under complete control of the engineers (e. g., the car’s 
design), whereas other inputs are not (the car’s driver, and the roads). He therefore 
distinguished between (i) controllable or decision variables, and (ii) noncontrollable 
or environmental noisy (or random) factors. So, the estimated optimum (see the pre- 
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ceding section) may turn out to be inferior because this optimum ignores uncertainties 
in some of the simulation inputs. Therefore we may proceed as follows. 

To simplify our discussion, we sort the k simulation inputs such that the first 
kc inputs are controllable, and the next kyc inputs are noncontrollable. We let Zc 
and Zyc denote the vector with the kç controllable and the kyc noncontrollable orig- 
inal (nonstandardized) inputs z. Taguchi assumes a single output (say) w, and fo- 
cuses on its mean E(w) and its variance; obviously, this variance is caused by Zyc, 
so Var(w|zc) > 0. Taguchi combines E(w) and Var(w|z-) into a scalar loss function 
such as the signal-to-noise or mean-to-variance ratio E(w)/ Var(w|Zc); see the detailed 
discussion of Taguchi’s approach in [31, pp. 486-488], which is the classic textbook 
on so-called response surface methodology (RSM). 

Taguchi’s approach is successful in production engineering, but statisticians crit- 
icize its statistical methods. We add that—compared with real-life experiments (dis- 
cussed in [31])—simulation experiments have more inputs, more input values, and 
more input combinations. Actually, [31, pp. 502-506] combines Taguchi’s worldview 
with the statisticians’ RSM. 

However, [15] uses E(w) and o(w|z,) separately; obviously, o(w|Zc) has the same 
scale as E(w) has. Mathematical optimization (MO) can then be used to solve the con- 
strained optimization: min,. E(w|z,) such that o(w|Zc) < c, where c, is a prespeci- 
fied upper threshold for ø. Constrained optimization is also discussed in [31, p. 492]. 
Whereas [31] superimposes contour plots for E(w|zc) and o(w|Zc) to estimate the op- 
timal zç, [15] uses MO. 

This MO, however, requires specification of c, (threshold for ø). In practice, man- 
agers may find it hard to select a specific value for c,. Therefore we try different c, val- 
ues, and estimate the corresponding Pareto-optimal efficiency frontier. This frontier, 
is uncertain because it depends on the estimators of E(w|z-) and o(w|z,). To estimate 
the variability of the frontier, we may apply bootstrapping. For details on this type of 
RO we refer to Dellino et al. [15], which is summarized in [22, pp. 280-284]. 

An application of RO using Kriging to estimate the Pareto frontier is [46]. Kriging 
for RO is also used in [8], comparing this approach with several alternative metamodel 
types (e. g., neural networks). 

Note that [44] presents an approach that follows RO as developed in MO (instead 
of Taguchian publications). RO in MO was originally developed by the Israeli mathe- 
matician Ben-Tal and uses concepts such as “uncertainty sets”, “robust counterparts”, 
and “adjustable decision rules”. The approach in [44] does not need a known distri- 
bution for the environmental inputs; i. e., this approach uses only experimental data 
combined with so-called “phi-divergence” uncertainty sets. This approach is applied 
to low-order polynomial metamodels, but may be extended to Kriging metamodels 
(without introducing additional complexity to the problem formulation). RO for both 
linear-regression and Kriging metamodels is detailed in [39]. 
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10.7 Conclusions and outlook 


We provided an overview of a specific type of metamodel; namely, Kriging or Gaus- 
sian process (GP). Kriging is popular in geostatistics and machine learning, and is also 
gaining popularity in the analysis of simulation experiments. However, many issues 
remain in Kriging; e. g., should we use a simple constant mean or a low-order polyno- 
mial trend? Should we select a Gaussian or a Matérn correlation function? How should 
we estimate the intrinsic variance of the simulation output for new input combina- 
tions of the simulation model? Kriging in random simulation needs further research 
on sequential designs for the selection of additional input combinations and the num- 
ber of replications for old and new combinations. Kriging is used in classic “efficient 
global optimization” (EGO); currently, many researchers also investigate random sim- 
ulation and constrained optimization. Robust optimization (RO) has just started in 
simulation, whereas it is a hot topic in Mathematical Optimization (MO). The appli- 
cation of various types of Kriging for sensitivity analysis and optimization remains 
challenging. 

Future research may try to solve the curse of dimensionality in Kriging; currently, 
the number of inputs is usually limited to (say) 20. To solve this problem, we may pre- 
cede Kriging by factor screening; several screening methods are presented in Chapter 4 
of [22]. Other solutions are discussed in [7, 43]. 

Another topic of future research is big data, in which the number of input/output 
combinations is so high that the computation of the inverse of the estimated covari- 
ance matrix is impossible. Various solutions are discussed in [17, 24]. 
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-singular/rectangular systems 183 
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eigenvalue problem 99, 112 
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expansion kernels 316 

expected improvement (El) 366 
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external noise 364 
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feature maps 314-316, 320 

finite difference time domain (FDTD) method 
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finite dimensional spaces 318 
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finite-time observability Gramian 19 

first-order perturbation analysis 167 

first-order sensitivity index 365 
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Fourier transform 287 
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Gaussian correlation function 359, 368 

Gaussian kernel 345 

Gaussian process regression 330 

Gaussian processes (GPs) 9, 355 

-stationary GP 357 

general linear group GL(n) 252 

generalized controllability matrix 185 
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generalized inverse approach 198 

generalized observability matrix 187 
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geodesic extrapolation 250 

geodesic interpolation 247 
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geometric first-order Taylor approximation 250 
geostatistics 355 

Gilbert realization 298, 303 

global sensitivity analysis 364 

gradient descent method 248 
gradient-based optimization method 341 
Gram-Schmidt orthonormalization 329 
Gramian-based cost functions 150, 163 
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Grassmann logarithm 265 
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greedy algorithm 10, 11, 69, 110 


Ho norm condition 164 
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Hamilton-Jacobi equations 16 
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Hammarling’s method 26 
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Hankel operator 23 
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Hermite interpolation 224 
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hyperreduction techniques 11, 107, 129, 134 

- discrete empirical interpolation method (DEIM) 
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implicit function theorem 244 
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inclusion relations 320 

index-preserving methods 4 
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— P-greedy VKOGA 346, 347 
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Laplace transform 59, 184 
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(LASSO) method 333 
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left spectral factor 33 
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144, 164 
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— non-parametrized LTI systems 70 
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local perturbation 158 

local sensitivity analysis 364 
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log-likelihood function 359 
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LTI control systems 27 
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matrix Lie group 242 
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maximum likelihood estimator 358, 359 
Maxwell’s equations 2, 36, 168, 276 
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mean-to-variance ratio 367 
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modal decomposition 97 
modal decomposition method 101 
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modal methods 97 
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perturbation approach 149 

Petrov—Galerkin method 10 
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Procrustes problem 266, 269 

proper controllability Gramian 37 

proper generalized decomposition (PGD) 3, 11 


proper orthogonal decomposition (POD) 3, 11, 
59, 109, 116, 117, 129, 131, 230, 233 
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quadratic optimization problem 334 
quadratic—bilinear control systems 42 
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queuing simulation models 356 
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rational functions 277 
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— SPD(n)-exponential 259 
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Riemannian optimization problem 248 
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robust optimization (RO) 365, 366, 368 
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Sanathanan-Koerner algorithm 276-279, 305 
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structure-preserving interpolation 229 
substructuring 3,103 
support vector machines (SVMs) 337 
-LIBSVM 339 
-liquidSVM 339 
surrogate modeling 311 
Sylvester equation 36, 187, 189 
system identification 107 
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